Intelligent Employee Retention System for Attrition Rate Analysis and Churn Prediction

The paper aims to examine the factors that influence employee attrition rate using the employee records dataset from kaggle.com. It also aims to establish the predictive power of Deep Learning for employee churn prediction over ensemble machine learning techniques like Random Forest and Gradient Boosting on real-time employee data from a mid-sized Fast-Moving Consumer Goods (FMCG) company. The results are further validated through a regression model and also by a multi-criteria Fuzzy Analytical Hierarchy Process (AHP) model which takes into account the relative variable importance and computes weights. The empirical results of the machine learning models indicate that Deep Neural Networks (91.2% accuracy) are a better predictor of churn than Random Forest and Gradient Boosting Algorithm (82.3% and 85.2% respectively). These findings provide useful insights for human resource (HR) managers in an organizational workplace context. The model when recalibrated by the human resource team of organizations helps in better incentivization and employee retention.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Churn Prediction of Employees Using Machine Learning Techniques

Tehnički glasnik ◽

10.31803/tg-20210204181812 ◽

2021 ◽

Vol 15 (1) ◽

pp. 51-59

Author(s):

Nilasha Bandyopadhyay ◽

Anil Jadhav

Keyword(s):

Random Forest ◽

Confusion Matrix ◽

False Positive Rate ◽

Random Forest Classifier ◽

Working Environment ◽

Attrition Rate ◽

Machine Learning Techniques ◽

Churn Prediction ◽

Technology Industry ◽

Positive Rate

Employees are considered as the most valuable assets of any organization. Various policies have been introduced by the HR professionals to create a good working environment for them, but still, the rate of employees quitting the Technology Industry is quite high. Often the reason behind their early attrition could be due to company-related or personal issues, such as No satisfaction at the workplace, Fewer opportunities for learning, Undue Workload, Less Encouragement, and many others. This paper aims in discussing a structured way for predicting the churn rate of the employees by implementing various Classification techniques like SVM, Random Forest classifier, and Naives Bayes classifier. The performance of the classifiers was compared using metrics like Confusion Matrix, Recall, False Positive Rate, and Accuracy to determine the best model for the churn prediction. We found that among the models, the Random Forest classifier proved to be the best model for IT employee churn prediction. A Correlation Matrix was generated in the form of a heatmap to identify the important features that might impact the attrition rate.

Download Full-text

Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

Journal Of Big Data ◽

10.1186/s40537-020-00345-2 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samiul Islam ◽

Saman Hassanzadeh Amin

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Random Forest ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Gradient Boosting Machine

Download Full-text

Estimating Warehouse Rental Price using Machine Learning Techniques

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.2.3034 ◽

2018 ◽

Vol 13 (2) ◽

pp. 235-250 ◽

Cited By ~ 3

Author(s):

Yixuan Ma ◽

Zhenji Zhang ◽

Alexander Ihler ◽

Baoxiang Pan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Estate ◽

Rapid Development ◽

Supply And Demand ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Logistics Industry ◽

Real Estate Price ◽

Learning Techniques

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.

Download Full-text

Comparative Analysis of Machine Learning Techniques to Identify Churn for Telecom Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.34.19210 ◽

2018 ◽

Vol 7 (3.34) ◽

pp. 291

Author(s):

M Malleswari ◽

R.J Manira ◽

Praveen Kumar ◽

Murugan .

Keyword(s):

Machine Learning ◽

Big Data ◽

Random Forest ◽

Decision Tree ◽

Apache Spark ◽

Machine Learning Techniques ◽

Churn Prediction ◽

Learning Techniques ◽

Boosted Tree ◽

Customer Attrition

Big data analytics has been the focus for large scale data processing. Machine learning and Big data has great future in prediction. Churn prediction is one of the sub domain of big data. Preventing customer attrition especially in telecom is the advantage of churn prediction. Churn prediction is a day-to-day affair involving millions. So a solution to prevent customer attrition can save a lot. This paper propose to do comparison of three machine learning techniques Decision tree algorithm, Random Forest algorithm and Gradient Boosted tree algorithm using Apache Spark. Apache Spark is a data processing engine used in big data which provides in-memory processing so that the processing speed is higher. The analysis is made by extracting the features of the data set and training the model. Scala is a programming language that combines both object oriented and functional programming and so a powerful programming language. The analysis is implemented using Apache Spark and modelling is done using scala ML. The accuracy of Decision tree model came out as 86%, Random Forest model is 87% and Gradient Boosted tree is 85%.

Download Full-text

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Download Full-text

Predicting in-Hospital Mortality of Patients with COVID-19 Using Machine Learning Techniques

Journal of Personalized Medicine ◽

10.3390/jpm11050343 ◽

2021 ◽

Vol 11 (5) ◽

pp. 343

Author(s):

Fabiana Tezza ◽

Giulia Lorenzoni ◽

Danila Azzolina ◽

Sofia Barbar ◽

Lucia Anna Carmela Leone ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Mortality ◽

Learning Algorithm ◽

Vital Signs ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Techniques

The present work aims to identify the predictors of COVID-19 in-hospital mortality testing a set of Machine Learning Techniques (MLTs), comparing their ability to predict the outcome of interest. The model with the best performance will be used to identify in-hospital mortality predictors and to build an in-hospital mortality prediction tool. The study involved patients with COVID-19, proved by PCR test, admitted to the “Ospedali Riuniti Padova Sud” COVID-19 referral center in the Veneto region, Italy. The algorithms considered were the Recursive Partition Tree (RPART), the Support Vector Machine (SVM), the Gradient Boosting Machine (GBM), and Random Forest. The resampled performances were reported for each MLT, considering the sensitivity, specificity, and the Receiving Operative Characteristic (ROC) curve measures. The study enrolled 341 patients. The median age was 74 years, and the male gender was the most prevalent. The Random Forest algorithm outperformed the other MLTs in predicting in-hospital mortality, with a ROC of 0.84 (95% C.I. 0.78–0.9). Age, together with vital signs (oxygen saturation and the quick SOFA) and lab parameters (creatinine, AST, lymphocytes, platelets, and hemoglobin), were found to be the strongest predictors of in-hospital mortality. The present work provides insights for the prediction of in-hospital mortality of COVID-19 patients using a machine-learning algorithm.

Download Full-text

Machine Learning Based Indoor Localisation Using Wi-Fi And Smartphone

Journal of Independent Studies and Research - Computing ◽

10.31645/06 ◽

2020 ◽

Author(s):

Zulqarnain Khokhar ◽

◽

Murtaza Ahmed Siddiqi ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Indoor Localization ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Devices ◽

Gradient Boosting ◽

Learning Techniques ◽

Indoor Localisation

Wi-Fi based indoor positioning with the help of access points and smart devices have become an integral part in finding a device or a person’s location. Wi-Fi based indoor localization technology has been among the most attractive field for researchers for a number of years. In this paper, we have presented Wi-Fi based in-door localization using three different machine-learning techniques. The three machine learning algorithms implemented and compared are Decision Tree, Random Forest and Gradient Boosting classifier. After making a fingerprint of the floor based on Wi-Fi signals, mentioned algorithms were used to identify device location at thirty different positions on the floor. Random Forest and Gradient Boosting classifier were able to identify the location of the device with accuracy higher than 90%. While Decision Tree was able to identify the location with accuracy a bit higher than 80%.

Download Full-text