Predictors of Turnover Intention in U.S. Federal Government Workforce: Machine Learning Evidence That Perceived Comprehensive HR Practices Predict Turnover Intention

This study aims to identify important predictors of turnover intention and to characterize subgroups of U.S. federal employees at high risk for turnover intention. Data were drawn from the 2018 Federal Employee Viewpoint Survey (FEVS, unweighted N = 598,003), a nationally representative sample of U.S. federal employees. Machine learning Classification and Regression Tree (CART) analyses were conducted to predict turnover intention and accounted for sample weights. CART analyses identified six at-risk subgroups. Predictor importance scores showed job satisfaction was the strongest predictor of turnover intention, followed by satisfaction with organization, loyalty, accomplishment, involvement in decisions, likeness to job, satisfaction with promotion opportunities, skill development opportunities, organizational tenure, and pay satisfaction. Consequently, Human Resource (HR) departments should seek to implement comprehensive HR practices to enhance employees’ perceptions on job satisfaction, workplace environments and systems, and favorable organizational policies and supports and make tailored interventions for the at-risk subgroups.

Download Full-text

Machine Learning Classification Models for More Effective Mine Safety Inspections

Volume 14: Emerging Technologies; Engineering Management, Safety, Ethics, Society, and Education; Materials: Genetics to Structures ◽

10.1115/imece2014-38709 ◽

2014 ◽

Author(s):

Jeremy M. Gernand

Keyword(s):

Machine Learning ◽

Safety Management ◽

Regression Tree ◽

The United States ◽

Worker Safety ◽

Classification And Regression Tree ◽

Mine Safety ◽

Health Administration ◽

Mining Operations ◽

Machine Learning Classification

The safety of mining in the United States has improved significantly over the past few decades, although it remains one of the more dangerous occupations. Following the Sago mine disaster in January 2006, federal legislation (The Mine Improvement and New Emergency Response {MINER} Act of 2006) tightened regulations and sought to strengthen the authority and safety inspection practices of the Mine Safety and Health Administration (MSHA). While penalties and inspection frequency have increased, understanding of what types of inspection findings are most indicative of serious future incidents is limited. The most effective safety management and oversight could be accomplished by a thorough understanding of what types of infractions or safety inspection findings are most indicative of serious future personnel injuries. However, given the large number of potentially different and unique inspection findings, varied mine characteristics, and types of specific safety incidents, this question is complex in terms of the large number of potentially relevant input parameters. New regulations rely on increasing the frequency and severity of infraction penalties to encourage mining operations to improve worker safety, but without the knowledge of which specific infractions may truly be signaling a dangerous work environment. This paper seeks to inform the question, what types of inspection findings are most indicative of serious future incidents for specific types of mining operations? This analysis utilizes publicly available MSHA databases of cited infractions and reportable incidents. These inspection results are used to train machine learning Classification and Regression Tree (CART) and Random Forest (RF) models that divide the groups of mines into peer groups based on their recent infractions and other defining characteristics with the aim of predicting whether or not a fatal or serious disabling injury is more likely to occur in the following 12-month period. With these characteristics available, additional scrutiny may be appropriately directed at those mining operations at greatest risk of experiencing a worker fatality or disabling injury in the near future. Increased oversight and attention on these mines where workers are at greatest risk may more effectively reduce the likelihood of worker deaths and injuries than increased penalties and inspection frequency alone.

Download Full-text

Flood Susceptibility Modeling in a Subtropical Humid Low-Relief Alluvial Plain Environment: Application of Novel Ensemble Machine Learning Approach

Frontiers in Earth Science ◽

10.3389/feart.2021.659296 ◽

2021 ◽

Vol 9 ◽

Author(s):

Manish Pandey ◽

Aman Arora ◽

Alireza Arabameri ◽

Romulus Costache ◽

Naveen Kumar ◽

...

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Classification And Regression Tree ◽

Ground Subsidence ◽

Ensemble Model ◽

Ganga Plain ◽

Humid Climate ◽

Area Index ◽

Flood Susceptibility ◽

Middle Ganga Plain

This study has developed a new ensemble model and tested another ensemble model for flood susceptibility mapping in the Middle Ganga Plain (MGP). The results of these two models have been quantitatively compared for performance analysis in zoning flood susceptible areas of low altitudinal range, humid subtropical fluvial floodplain environment of the Middle Ganga Plain (MGP). This part of the MGP, which is in the central Ganga River Basin (GRB), is experiencing worse floods in the changing climatic scenario causing an increased level of loss of life and property. The MGP experiencing monsoonal subtropical humid climate, active tectonics induced ground subsidence, increasing population, and shifting landuse/landcover trends and pattern, is the best natural laboratory to test all the susceptibility prediction genre of models to achieve the choice of best performing model with the constant number of input parameters for this type of topoclimatic environmental setting. This will help in achieving the goal of model universality, i.e., finding out the best performing susceptibility prediction model for this type of topoclimatic setting with the similar number and type of input variables. Based on the highly accurate flood inventory and using 12 flood predictors (FPs) (selected using field experience of the study area and literature survey), two machine learning (ML) ensemble models developed by bagging frequency ratio (FR) and evidential belief function (EBF) with classification and regression tree (CART), CART-FR and CART-EBF, were applied for flood susceptibility zonation mapping. Flood and non-flood points randomly generated using flood inventory have been apportioned in 70:30 ratio for training and validation of the ensembles. Based on the evaluation performance using threshold-independent evaluation statistic, area under receiver operating characteristic (AUROC) curve, 14 threshold-dependent evaluation metrices, and seed cell area index (SCAI) meant for assessing different aspects of ensembles, the study suggests that CART-EBF (AUCSR = 0.843; AUCPR = 0.819) was a better performant than CART-FR (AUCSR = 0.828; AUCPR = 0.802). The variability in performances of these novel-advanced ensembles and their comparison with results of other published models espouse the need of testing these as well as other genres of susceptibility models in other topoclimatic environments also. Results of this study are important for natural hazard managers and can be used to compute the damages through risk analysis.

Download Full-text

GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran

Environmental Monitoring and Assessment ◽

10.1007/s10661-015-5049-6 ◽

2015 ◽

Vol 188 (1) ◽

Cited By ~ 208

Author(s):

Seyed Amir Naghibi ◽

Hamid Reza Pourghasemi ◽

Barnali Dixon

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Groundwater Potential ◽

Classification And Regression Tree ◽

Learning Models ◽

Boosted Regression Tree ◽

Potential Mapping ◽

Classification And Regression ◽

Groundwater Potential Mapping ◽

Machine Learning Models

Download Full-text

Single-Channel EEG-Based Machine Learning Method for Prescreening Major Depressive Disorder

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019500342 ◽

2019 ◽

Vol 18 (05) ◽

pp. 1579-1603 ◽

Cited By ~ 2

Author(s):

Zhijiang Wan ◽

Hao Zhang ◽

Jiajin Huang ◽

Haiyan Zhou ◽

Jie Yang ◽

...

Keyword(s):

Machine Learning ◽

Major Depressive Disorder ◽

Depressive Disorder ◽

Single Channel ◽

Regression Tree ◽

Classification And Regression Tree ◽

Machine Learning Method ◽

Learning Method ◽

Eeg Analysis ◽

Major Depressive

Many studies developed the machine learning method for discriminating Major Depressive Disorder (MDD) and normal control based on multi-channel electroencephalogram (EEG) data, less concerned about using single channel EEG collected from forehead scalp to discriminate the MDD. The EEG dataset is collected by the Fp1 and Fp2 electrode of a 32-channel EEG system. The result demonstrates that the classification performance based on the EEG of Fp1 location exceeds the performance based on the EEG of Fp2 location, and shows that single-channel EEG analysis can provide discrimination of MDD at the level of multi-channel EEG analysis. Furthermore, a portable EEG device collecting the signal from Fp1 location is used to collect the second dataset. The Classification and Regression Tree combining genetic algorithm (GA) achieves the highest accuracy of 86.67% based on leave-one-participant-out cross validation, which shows that the single-channel EEG-based machine learning method is promising to support MDD prescreening application.

Download Full-text

Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing

JMIR Medical Informatics ◽

10.2196/18910 ◽

2020 ◽

Vol 8 (7) ◽

pp. e18910

Author(s):

Debbie Rankin ◽

Michaela Black ◽

Raymond Bond ◽

Jonathan Wallace ◽

Maurice Mulvenna ◽

...

Keyword(s):

Machine Learning ◽

Health Care ◽

Bayesian Network ◽

Synthetic Data ◽

Regression Tree ◽

Real Data ◽

Classification And Regression Tree ◽

Supervised Machine Learning ◽

Statistical Disclosure ◽

Classification And Regression

Background The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. Objective This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. Methods A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. Results A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. Conclusions The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.

Download Full-text

Estimating the Optimal Dexketoprofen Pharmaceutical Formulation with Machine Learning Methods and Statistical Approaches

Healthcare Informatics Research ◽

10.4258/hir.2021.27.4.279 ◽

2021 ◽

Vol 27 (4) ◽

pp. 279-286

Author(s):

Atakan Başkor ◽

Yağmur Pirinçci Tok ◽

Burcu Mesut ◽

Yıldız Özsoy ◽

Tamer Uçar

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Cost Effective ◽

Pharmaceutical Formulation ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Disintegration Time ◽

Pharmaceutical Dosage Form ◽

Extreme Gradient Boosting

Objectives: Orally disintegrating tablets (ODTs) can be utilized without any drinking water; this feature makes ODTs easy to use and suitable for specific groups of patients. Oral administration of drugs is the most commonly used route, and tablets constitute the most preferable pharmaceutical dosage form. However, the preparation of ODTs is costly and requires long trials, which creates obstacles for dosage trials. The aim of this study was to identify the most appropriate formulation using machine learning (ML) models of ODT dexketoprofen formulations, with the goal of providing a cost-effective and timereducing solution.Methods: This research utilized nonlinear regression models, including the k-nearest neighborhood (k-NN), support vector regression (SVR), classification and regression tree (CART), bootstrap aggregating (bagging), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) methods, as well as the t-test, to predict the quantity of various components in the dexketoprofen formulation within fixed criteria.Results: All the models were developed with Python libraries. The performance of the ML models was evaluated with R2 values and the root mean square error. Hardness values of 0.99 and 2.88, friability values of 0.92 and 0.02, and disintegration time values of 0.97 and 10.09 using the GBM algorithm gave the best results.Conclusions: In this study, we developed a computational approach to estimate the optimal pharmaceutical formulation of dexketoprofen. The results were evaluated by an expert, and it was found that they complied with Food and Drug Administration criteria.

Download Full-text

The Link between HR Attributions and Employees’ Turnover Intentions

Gadjah Mada International Journal of Business ◽

10.22146/gamaijb.9287 ◽

2016 ◽

Vol 18 (1) ◽

pp. 55 ◽

Cited By ~ 6

Author(s):

Juliana Caesaria Tandung

Keyword(s):

Job Satisfaction ◽

Human Resources ◽

Turnover Intention ◽

Turnover Intentions ◽

Human Resources Management ◽

Intention To Leave ◽

Individual Level ◽

Hr Practices ◽

And Behavior ◽

Attitudes And Behavior

Human Resources Management (HRM) is part of the organizational functions that contribute to the effectiveness of a firm’s performance, and brings an organization a competitive advantage through the implementation of its Human Resources (HR) practices. HR practices adopted by management are perceived or attributed subjectively by individual employees, and can in turn affect the employees’ attitudes and behavior (e.g. Job satisfaction and turnover intention). The purpose of this study is to contribute to the process-based approach by investigating the effect of HR attributions on turnover intentions, with job satisfaction playing a mediating role. The analysis is on the individual level, with 454 respondents from various organizations within the Netherlands. The results show that HR attributions can affect the turnover intention, through the presence of job satisfaction. Thus, it can be said that it is important to always consider the employees’ attitudes and behavior when examining their perception of HR practices, and in predicting their intention to leave.

Download Full-text

Consensus of Feature Selection Methods and Reduced Generalization Gap Model to Improve Diagnosis of Heart Disease

Journal of Scientific Research ◽

10.3329/jsr.v13i3.53290 ◽

2021 ◽

Vol 13 (3) ◽

pp. 901-913

Author(s):

S. Gupta ◽

R. R. Sedamkar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Missing Values ◽

Performance Metrics ◽

Model Performance ◽

Regression Tree ◽

Classification And Regression Tree ◽

Proposed Model ◽

Time Required

Enhancing the diagnostic ability of Machine Learning models for acceptable prediction in the healthcare community is still a concern. There are critical care disease datasets available online on which researchers have experimented with a different number of instances and features for similar disease prediction. Further, different Machine Learning (ML) models have different preprocessing requirements. Framingham heart disease data is multicollinear and has missing values. Thus, the proposed model aims to explore the differential preprocessing needs of ML models followed by feature selection in consensus with domain experts and feature extraction to resolve multicollinearity issues. Missing values have been imputed differently for each feature. The work also identifies optimal train set size by plotting a learning curve that provides a minimum generalization gap. When testing is done on this hyperparameter tuned model, performance is enhanced with respect to the F score weighted by support and stratification since the data is imbalanced. Experimental results demonstrate improvement in performance metrics, i.e., weighted F score, precision, recall, accuracy up to 3 %, and F1 score by 8 % for Logistic Regression Classifier with the proposed model. Further, the time required for hyperparameter tuning is reduced by 50% for tree-based models, particularly Classification and Regression Tree (CART).

Download Full-text

HOW MACHINE LEARNING METHOD PERFORMANCE FOR IMBALANCED DATA

TEKNOKOM ◽

10.31943/teknokom.v4i2.64 ◽

2021 ◽

Vol 4 (2) ◽

pp. 48-52

Author(s):

Pardomuan Robinson Sihombing

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Imbalanced Data ◽

Large Data ◽

Original Data ◽

Classification And Regression Tree ◽

Support Vector ◽

Method Performance ◽

Survey Statistics ◽

Working Status

This study will examine the application of several classification methods to machine learning models by taking into account the case of imbalanced data. The research was conducted on a case study of classification modeling for working status in Banten Province in 2020. The data used comes from the National Labor Force Survey, Statistics Indonesia. The machine learning methods used are Classification and Regression Tree (CART), Naïve Bayes, Random Forest, Rotation Forest, Support Vector Machine (SVM), Neural Network Analysis, One Rule (OneR), and Boosting. Classification modeling using resample techniques in cases of imbalanced data and large data sets is proven to improve classification accuracy, especially for minority classes, which can be seen from the sensitivity and specificity values that are more balanced than the original data (without treatment). Furthermore, the eight classification models tested shows that the Boost model provides the best performance based on the highest sensitivity, specificity, G-mean, and kappa coefficient values. The most important/most influential variables in the classification of working status are marital status, education, and age.

Download Full-text

Moving the Needle on Evaluating Multivariate Screening Accuracy

Assessment for Effective Intervention ◽

10.1177/1534508418791740 ◽

2018 ◽

Vol 45 (2) ◽

pp. 83-94

Author(s):

Yaacov Petscher ◽

Sharon Koon

Keyword(s):

At Risk ◽

Logistic Regression ◽

Regression Tree ◽

Classification And Regression Tree ◽

Stanford Achievement Test ◽

School Year ◽

Trade Offs ◽

Universal Screener ◽

Screening Assessment ◽

Screening Accuracy

The assessment of screening accuracy and setting of cut points for a universal screener have traditionally been evaluated using logistic regression analysis. This analytic technique has been frequently used to evaluate the trade-offs in correct classification with misidentification of individuals who are at risk of performing poorly on a later outcome. Although useful statistically, coefficients from a multiple logistic regression can be difficult to explain to practitioners as it pertains to classification decisions. Moreover, classifications based on multivariate assessments are challenging to understand how performance on one assessment compensates for performance on another. The purpose of this article is to demonstrate and compare the use of logistic regression with classification and regression tree (CART) models in the identification of students who are at risk of reading comprehension difficulties. Data consisted of 986 Grade 1 students and 887 Grade 2 students who were administered a screening assessment at the middle of the school year as well as the 10th edition of the Stanford Achievement Test. Results indicated that CART performs comparably with logistic regression and may assist researchers and practitioners in explaining classification rules to parents and educators.

Download Full-text