An Optimal Decision Tree Model for Diabetes Diagnosis

Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.

Download Full-text

A novel enhanced decision tree model for detecting chronic kidney disease

Network Modeling Analysis in Health Informatics and Bioinformatics ◽

10.1007/s13721-021-00302-w ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Avijit Kumar Chaudhuri ◽

Deepankar Sinha ◽

Dilip K. Banerjee ◽

Anirban Das

Keyword(s):

Chronic Kidney Disease ◽

Kidney Disease ◽

Decision Tree ◽

Decision Tree Model ◽

Tree Model

Download Full-text

Risk of Pre-Malignancy or Malignancy in Postmenopausal Endometrial Polyps: A CHAID Decision Tree Analysis

Diagnostics ◽

10.3390/diagnostics11061094 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1094

Author(s):

Michael Wong ◽

Nikolaos Thanatsis ◽

Federica Nardelli ◽

Tejal Amin ◽

Davor Jurkovic

Keyword(s):

Decision Tree ◽

Expectant Management ◽

Decision Tree Model ◽

Decision Tree Analysis ◽

Focal Lesions ◽

Tree Model ◽

Normal Endometrium ◽

Endometrial Polyps ◽

Tree Analysis ◽

Interaction Detection

Background and aims: Postmenopausal endometrial polyps are commonly managed by surgical resection; however, expectant management may be considered for some women due to the presence of medical co-morbidities, failed hysteroscopies or patient’s preference. This study aimed to identify patient characteristics and ultrasound morphological features of polyps that could aid in the prediction of underlying pre-malignancy or malignancy in postmenopausal polyps. Methods: Women with consecutive postmenopausal polyps diagnosed on ultrasound and removed surgically were recruited between October 2015 to October 2018 prospectively. Polyps were defined on ultrasound as focal lesions with a regular outline, surrounded by normal endometrium. On Doppler examination, there was either a single feeder vessel or no detectable vascularity. Polyps were classified histologically as benign (including hyperplasia without atypia), pre-malignant (atypical hyperplasia), or malignant. A Chi-squared automatic interaction detection (CHAID) decision tree analysis was performed with a range of demographic, clinical, and ultrasound variables as independent, and the presence of pre-malignancy or malignancy in polyps as dependent variables. A 10-fold cross-validation method was used to estimate the model’s misclassification risk. Results: There were 240 women included, 181 of whom presented with postmenopausal bleeding. Their median age was 60 (range of 45–94); 18/240 (7.5%) women were diagnosed with pre-malignant or malignant polyps. In our decision tree model, the polyp mean diameter (≤13 mm or >13 mm) on ultrasound was the most important predictor of pre-malignancy or malignancy. If the tree was allowed to grow, the patient’s body mass index (BMI) and cystic/solid appearance of the polyp classified women further into low-risk (≤5%), intermediate-risk (>5%–≤20%), or high-risk (>20%) groups. Conclusions: Our decision tree model may serve as a guide to counsel women on the benefits and risks of surgery for postmenopausal endometrial polyps. It may also assist clinicians in prioritizing women for surgery according to their risk of malignancy.

Download Full-text

Reanalysis and External Validation of a Decision Tree Model for Detecting Unrecognized Diabetes in Rural Chinese Individuals

International Journal of Endocrinology ◽

10.1155/2017/3894870 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Zhong Xin ◽

Lin Hua ◽

Xu-Hong Wang ◽

Dong Zhao ◽

Cai-Guo Yu ◽

...

Keyword(s):

Decision Tree ◽

Predictive Value ◽

Early Stage ◽

Current Model ◽

External Validation ◽

Area Under The Curve ◽

Decision Tree Model ◽

Tree Model ◽

Chinese Adult ◽

Significant Difference

We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.

Download Full-text