Classification Models Using Decision Tree, Random Forest, and Moving Average Analysis

2020 ◽  
pp. 91-115
Author(s):  
Rohit Dutt ◽  
Harish Dureja ◽  
A. K. Madan
2020 ◽  
Vol 28 (6) ◽  
pp. 1273-1291
Author(s):  
Nesreen El-Rayes ◽  
Ming Fang ◽  
Michael Smith ◽  
Stephen M. Taylor

Purpose The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes. Design/methodology/approach A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition. Findings Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm. Practical implications This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models. Originality/value This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.


Author(s):  
A.K. Madan ◽  
Rohit Dutt

In the present study, the application of a wide variety of topological descriptors was investigated for predicting hydrophobicity (clogP) of isatin analogues. A total of four topochemical indices selected through decision tree (DT) were used for the development of single index based models using moving average analysis (MAA). The overall accuracy of prediction varied from a minimum of 95% to a maximum of 98% with regard to hydrophobicity.The values of sensitivity, specificity and Mathew's correlation coefficient for all MAA based models with regard to hydrophobicity (clogP) was found to be =78%, =94% and =0.85 respectively, suggesting robustness of proposed models. Since the compounds with high clogP values were found effective in carboxylesterases (CEs) inhibition, therefore, highly hydrophobic ranges of proposed MAA models can easily be exploited for the design and development of potent CEs inhibitors.


Author(s):  
Sasmita Kumari Nayak ◽  
Mamata Beura ◽  
Mohammed Siddique ◽  
Siba Prasad Mishra

For human life, Food is highly necessary and essential for human to live the life. The objective of the current study is to characterise, classify and compare the food consumption patterns of many Indian food diets such as non-vegetarian and vegetarian. Given data about different Indian dishes, we try to predict here the dish is vegetarian or not. To get the best predictive model, this study is conducted with the comparison of Decision Tree, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Random Forest algorithms. In this study, the concept and implementation of all these four models be made for prediction of Indian food. For training and testing the models, Indian food dataset is used that contains, in total 255 records to fit with all these four models. In short, the classification and prediction of Decision tree and KNN model provides less performance than the other models used here. However, the Random Forest model was generally more accurate than SVM, KNN and Decision Tree model, which have got from the simulation. 


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Sign in / Sign up

Export Citation Format

Share Document