Uplift modeling VS conventional predictive model: A reliable machine learning model to solve employee turnover

Davin Wijaya; Jumri Habbeyb DS; Samuelta Barus; Beriman Pasaribu; Loredana Ioana Sirbu; Abdi Dharma

doi:10.29099/ijair.v4i2.169

Uplift modeling VS conventional predictive model: A reliable machine learning model to solve employee turnover

International Journal of Artificial Intelligence Research ◽

10.29099/ijair.v4i2.169 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Davin Wijaya ◽

Jumri Habbeyb DS ◽

Samuelta Barus ◽

Beriman Pasaribu ◽

Loredana Ioana Sirbu ◽

...

Keyword(s):

Machine Learning ◽

Success Rate ◽

Predictive Model ◽

Employee Turnover ◽

Performance Comparison ◽

Uplift Modeling ◽

Machine Learning Model ◽

Average Accuracy ◽

Synthetic Datasets ◽

The Right

Employee turnover is the loss of talent in the workforce that can be costly for a company. Uplift modeling is one of the prescriptive methods in machine learning models that not only predict an outcome but also prescribe a solution. Recent studies are focusing on the conventional predictive models to predict employee turnover rather than uplift modeling. In this research, we analyze whether the uplifting model has better performance than the conventional predictive model in solving employee turnover. Performance comparison between the two methods was carried out by experimentation using two synthetic datasets and one real dataset. The results show that despite the conventional predictive model yields an average prediction accuracy of 84%; it only yields a success rate of 50% to target the right employee with a retention program on the three datasets. By contrast, the uplift model only yields an average accuracy of 67% but yields a consistent success rate of 100% in targeting the right employee with a retention program.

Download Full-text

P1060USING ARTIFICIAL INTELLIGENCE TO PREDICT HOME THERAPY CANDIDATES

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfaa142.p1060 ◽

2020 ◽

Vol 35 (Supplement_3) ◽

Author(s):

Jerry Yu ◽

Andrew Long ◽

Maria Hanson ◽

Aleetha Ellis ◽

Michael Macarthur ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Feedback Loop ◽

Area Under The Curve ◽

Patient Characteristics ◽

Training Data ◽

Quality Improvement Initiative ◽

Home Therapy ◽

Test Dataset ◽

Machine Learning Model

Abstract Background and Aims There are many benefits for performing dialysis at home including more flexibility and more frequent treatments. A possible barrier to election of home therapy (HT) by in-center patients is a lack of adequate HT education. To aid efficient education efforts, a predictive model was developed to help identify patients who are more likely to switch from in-center and succeed on HT. Method We developed a model using machine learning to predict which patients who are treated in-center without prior HT history are most likely to switch to HT in the next 90 days and stay on HT for at least 90 days. Training data was extracted from 2016–2019 for approximately 300,000 patients. We randomly sampled one in-center treatment date per patient and determined if the patient would switch and succeed on HT. The input features consisted of treatment vitals, laboratories, absence history, comprehensive assessments, facility information, county-level housing, and patient characteristics. Patients were excluded if they had less than 30 days on dialysis due to lack of data. A machine learning model (XGBoost classifier) was deployed monthly in a pilot with a team of HT educators to investigate the model’s utility for identifying HT candidates. Results There were approximately 1,200 patients starting a home therapy per month in a large dialysis provider, with approximately one-third being in-center patients. The prevalence of switching and succeeding to HT in this population was 2.54%. The predictive model achieved an area under the curve of 0.87, sensitivity of 0.77, and a specificity of 0.80 on a hold-out test dataset. The pilot was successfully executed for several months and two major lessons were learned: 1) some patients who reappeared on each month’s list should be removed from the list after expressing no interest in HT, and 2) a data collection mechanism should be put in place to capture the reasons why patients are not interested in HT. Conclusion This quality-improvement initiative demonstrates that predictive modeling can be used to identify patients likely to switch and succeed on home therapy. Integration of the model in existing workflows requires creating a feedback loop which can help improve future worklists.

Download Full-text

SafeOne Machine Learning model to predict industrial incidents in Chemical and Gas Industries

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1132 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1123-1131

Author(s):

Ganapathy Subramaniam Balasubramanian, Et. al.

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Workplace Safety ◽

Machine Learning Algorithms ◽

Industrial Safety ◽

Machine Learning Model ◽

Helpful Factors ◽

The Right ◽

Aws Lambda ◽

Safety Strategy

Understanding activity incidents is one of the necessary measures in workplace safety strategy. Analyzing the trends of the activity incident information helps to spot the potential pain points and helps to scale back the loss. Optimizing the Machine Learning algorithms may be a comparatively new trend to suit the prediction model and algorithms within the right place to support human helpful factors. This research aims to make a prediction model spot the activity incidents in chemical and gas industries. This paper describes the design and approach of building and implementing the prediction model to predict the reason behind the incident which may be used as a key index for achieving industrial safety specific to chemical and gas industries. The implementation of the grading algorithmic program including the prediction model ought to bring unbiased information to get a logical conclusion. The prediction model has been trained against incident information that has 25700 chemical industrial incidents with accident descriptions for the last decade. Inspection information and incident logs ought to be chomped high of the trained dataset to verify and validate the implementation. The result of the implementation provides insight towards the understanding of the patterns, classifications, associated conjointly contributes to an increased understanding of quantitative and qualitative analytics. Innovative cloud-based technology discloses the gate to method the continual in-streaming information, method it, and output the required end in a period. The first technology stack utilized in this design is Apache Kafka, Apache Spark, KSQL, Data frames, and AWS Lambda functions. Lambda functions are accustomed implement the grading algorithmic program and prediction algorithmic program to put in writing out the results back to AWS S3 buckets. Proof of conception implementation of the prediction model helps the industries to examine through the incidents and can layout the bottom platform for the assorted protective implementations that continuously advantage the workplace's name, growth, and have less attrition in human resources.

Download Full-text

Classifications of Neurodegenerative Disorders Using a Multiplex Blood Biomarkers-Based Machine Learning Model

International Journal of Molecular Sciences ◽

10.3390/ijms21186914 ◽

2020 ◽

Vol 21 (18) ◽

pp. 6914

Author(s):

Chin-Hsien Lin ◽

Shu-I Chiu ◽

Ta-Fu Chen ◽

Jyh-Shing Roger Jang ◽

Ming-Jang Chiu

Keyword(s):

Machine Learning ◽

Neurodegenerative Disorders ◽

Early Stage ◽

Machine Learning Algorithms ◽

Linear Discriminant ◽

Applied Machine Learning ◽

Machine Learning Model ◽

Average Accuracy ◽

The Individual ◽

Normal Cognition

Easily accessible biomarkers for Alzheimer’s disease (AD), Parkinson’s disease (PD), frontotemporal dementia (FTD), and related neurodegenerative disorders are urgently needed in an aging society to assist early-stage diagnoses. In this study, we aimed to develop machine learning algorithms using the multiplex blood-based biomarkers to identify patients with different neurodegenerative diseases. Plasma samples (n = 377) were obtained from healthy controls, patients with AD spectrum (including mild cognitive impairment (MCI)), PD spectrum with variable cognitive severity (including PD with dementia (PDD)), and FTD. We measured plasma levels of amyloid-beta 42 (Aβ42), Aβ40, total Tau, p-Tau181, and α-synuclein using an immunomagnetic reduction-based immunoassay. We observed increased levels of all biomarkers except Aβ40 in the AD group when compared to the MCI and controls. The plasma α-synuclein levels increased in PDD when compared to PD with normal cognition. We applied machine learning-based frameworks, including a linear discriminant analysis (LDA), for feature extraction and several classifiers, using features from these blood-based biomarkers to classify these neurodegenerative disorders. We found that the random forest (RF) was the best classifier to separate different dementia syndromes. Using RF, the established LDA model had an average accuracy of 76% when classifying AD, PD spectrum, and FTD. Moreover, we found 83% and 63% accuracies when differentiating the individual disease severity of subgroups in the AD and PD spectrum, respectively. The developed LDA model with the RF classifier can assist clinicians in distinguishing variable neurodegenerative disorders.

Download Full-text

Integrating Bayesian Calibration, Bias Correction, and Machine Learning for the 2014 Sandia Verification and Validation Challenge Problem

Journal of Verification Validation and Uncertainty Quantification ◽

10.1115/1.4031983 ◽

2016 ◽

Vol 1 (1) ◽

Cited By ~ 15

Author(s):

Wei Li ◽

Shishi Chen ◽

Zhen Jiang ◽

Daniel W. Apley ◽

Zhenzhou Lu ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Bias Correction ◽

Verification And Validation ◽

Model Parameters ◽

Loading Conditions ◽

Multiple Sources ◽

Bayesian Calibration ◽

Sources Of Uncertainty ◽

Machine Learning Model

This paper describes an integrated Bayesian calibration, bias correction, and machine learning approach to the validation challenge problem posed at the Sandia Verification and Validation Challenge Workshop, May 7–9, 2014. Three main challenges are recognized as: I—identification of unknown model parameters; II—quantification of multiple sources of uncertainty; and III—validation assessment when there are no direct experimental measurements associated with one of the quantities of interest (QoIs), i.e., the von Mises stress. This paper addresses these challenges as follows. For challenge I, sensitivity analysis is conducted to select model parameters that have significant impact on the model predictions for the displacement, and then a modular Bayesian approach is performed to calibrate the selected model parameters using experimental displacement data from lab tests under the “pressure only” loading conditions. Challenge II is addressed using a Bayesian model calibration and bias correction approach. For improving predictions of displacement under “pressure plus liquid” loading conditions, a spatial random process (SRP) based model bias correction approach is applied to develop a refined predictive model using experimental displacement data from field tests. For challenge III, the underlying relationship between stress and displacement is identified by training a machine learning model on the simulation data generated from the supplied tank model. Final predictions of stress are made via the machine learning model and using predictions of displacements from the bias-corrected predictive model. The proposed approach not only allows the quantification of multiple sources of uncertainty and errors in the given computer models, but also is able to combine multiple sources of information to improve model performance predictions in untested domains.

Download Full-text

Machine Learning Approach to Predict Positive Screening of Methicillin-Resistant Staphylococcus aureus During Mechanical Ventilation Using Synthetic Dataset From MIMIC-IV Database

Frontiers in Medicine ◽

10.3389/fmed.2021.694520 ◽

2021 ◽

Vol 8 ◽

Author(s):

Yohei Hirano ◽

Keito Shinmoto ◽

Yohei Okada ◽

Kazuhiro Suga ◽

Jeffrey Bombard ◽

...

Keyword(s):

Machine Learning ◽

Mechanical Ventilation ◽

Learning Model ◽

Positive Outcomes ◽

Training Set ◽

Mechanically Ventilated ◽

Mechanically Ventilated Patients ◽

Machine Learning Model ◽

Synthetic Datasets ◽

Ventilated Patients

Background: Mechanically ventilated patients are susceptible to nosocomial infections such as ventilator-associated pneumonia. To treat ventilated patients with suspected infection, clinicians select appropriate antibiotics. However, decision-making regarding the use of antibiotics for methicillin-resistant Staphylococcus aureus (MRSA) is challenging, because of the lack of evidence-supported criteria. This study aims to derive a machine learning model to predict MRSA as a possible pathogen responsible for infection in mechanically ventilated patients.Methods: Data were collected from the Medical Information Mart for Intensive Care (MIMIC)-IV database (an openly available database of patients treated at the Beth Israel Deaconess Medical Center in the period 2008–2019). Of 26,409 mechanically ventilated patients, 809 were screened for MRSA during the mechanical ventilation period and included in the study. The outcome was positivity to MRSA on screening, which was highly imbalanced in the dataset, with 93.9% positive outcomes. Therefore, after dividing the dataset into a training set (n = 566) and a test set (n = 243) for validation by stratified random sampling with a 7:3 allocation ratio, synthetic datasets with 50% positive outcomes were created by synthetic minority over-sampling for both sets individually (synthetic training set: n = 1,064; synthetic test set: n = 456). Using these synthetic datasets, we trained and validated an XGBoost machine learning model using 28 predictor variables for outcome prediction. Model performance was evaluated by area under the receiver operating characteristic (AUROC), sensitivity, specificity, and other statistical measurements. Feature importance was computed by the Gini method.Results: In validation, the XGBoost model demonstrated reliable outcome prediction with an AUROC value of 0.89 [95% confidence interval (CI): 0.83–0.95]. The model showed a high sensitivity of 0.98 [CI: 0.95–0.99], but a low specificity of 0.47 [CI: 0.41–0.54] and a positive predictive value of 0.65 [CI: 0.62–0.68]. Important predictor variables included admission from the emergency department, insertion of arterial lines, prior quinolone use, hemodialysis, and admission to a surgical intensive care unit.Conclusions: We were able to develop an effective machine learning model to predict positive MRSA screening during mechanical ventilation using synthetic datasets, thus encouraging further research to develop a clinically relevant machine learning model for antibiotics stewardship.

Download Full-text

Personalised prediction of daily eczema severity scores using a mechanistic machine learning model

10.1101/2020.01.16.20017772 ◽

2020 ◽

Cited By ~ 1

Author(s):

Guillem Hurault ◽

Elisa Domínguez-Hüttinger ◽

Sinéad M. Langan ◽

Hywel C. Williams ◽

Reiko J. Tanaka

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Treatment Strategies ◽

Learning Model ◽

Patient Specific ◽

Topical Steroids ◽

Individual Level ◽

Machine Learning Model ◽

Severity Scores ◽

Personalised Treatment

ABSTRACTBackgroundAtopic dermatitis (AD) is a chronic inflammatory skin disease with periods of flares and remission. Designing personalised treatment strategies for AD is challenging, given the apparent unpredictability and large variation in AD symptoms and treatment responses within and across individuals. Better prediction of AD severity over time for individual patients could help to select optimum timing and type of treatment for improving disease control.ObjectiveWe aimed to develop a mechanistic machine learning model that predicts the patient-specific evolution of AD severity scores on a daily basis.MethodsWe designed a probabilistic predictive model and trained it using Bayesian inference with the longitudinal data from two published clinical studies. The data consisted of daily recordings of AD severity scores and treatments used by 59 and 334 AD children over 6 months and 16 weeks, respectively. Internal and external validation of the predictive model was conducted in a forward-chaining setting.ResultsOur model was able to predict future severity scores at the individual level and improved chance-level forecast by 60%. Heterogeneous patterns in severity trajectories were captured with patient-specific parameters such as the short-term persistence of AD severity and responsiveness to topical steroids, calcineurin inhibitors and step-up treatment.ConclusionOur proof of principle model successfully predicted the daily evolution of AD severity scores at an individual level, and could inform the design of personalised treatment strategies that can be tested in future studies.

Download Full-text

Foreseeing Employee Attritions using Diverse Data Mining Strategies

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2406.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 620-626

Keyword(s):

Machine Learning ◽

Best Practices ◽

Employee Turnover ◽

Employee Retention ◽

Forecast Model ◽

Learning Systems ◽

Knowledge Based ◽

Machine Learning Model ◽

Employee Attrition ◽

Diverse Data

“Employee turnover is a noteworthy matter in knowledge-based companies.” On the off chance that employee leaves, they carry with them tacit information, often a source of competitive benefit to the other firms. Keeping in mind the end goal, to stay in the market and retain its employees, an organization requires minimizing employee attrition. This article discusses the employee churn/attrition forecast model using various methods of Machine Learning. Model yields are then scrutinized to outline and experiment the best practices on employee withholding at different stages of the employee’s association with an organization. This work has the potential for outlining better employee retention designs and enhancing employee contentment. This paper incorporates and condenses the capacity to gain from information and give information-driven experiences, choice, and forecasts and thinks about significant machine learning systems that have been utilized to create predictive churn models.

Download Full-text

The Classification of Elbow Extension and Flexion: A feature selection investigation

Mekatronika ◽

10.15282/mekatronika.v2i2.7017 ◽

2020 ◽

Vol 2 (2) ◽

pp. 68-73

Author(s):

Mohamad Ilyas Rizan ◽

Muhammad Nur Aiman Shapiee ◽

Muhammad Amirul Abdullah ◽

Mohd Azraai Mohd Razman ◽

Anwar P. P. Abdul Majeed

Keyword(s):

Machine Learning ◽

Information Gain ◽

Advanced Technology ◽

Modern World ◽

K Nearest Neighbors ◽

Rehabilitation Process ◽

Machine Learning Model ◽

Elbow Movement ◽

The Time Domain ◽

The Right

Nowadays, the worldwide primary reasons of long-term disability is stroke. When the blood supply to your brain is interupted and reduced, stroke occurs as it depriving brain tissue of nutrients and oxygen. In the modern world, advanced technology are revolutionizing the rehabilitation process. This research uses mechanomyography (MMG) and machine learning models to classify the elbow movement, extension and flexion of the elbow joint. The study will aid in the control of an exoskeleton for stroke patient's rehabilitation process in future studies. Five volunteers (21 to 23 years old) were recruited in Universiti Malaysia Pahang (UMP) to execute the right elbow movement of extension and flexion. The movements are repeated five times each for two active muscles for the extension and flexion motion, namely triceps and biceps. From the time domain based MMG signals, twenty-four features were extracted from the MMG before being classified by the machine learning model, namely k-Nearest Neighbors (k-NN). The k-NN has achieved the classification accuracy (CA) with 88.6% as the significant features are identified through the information gain approach. It may well be stated that the suggested process was able to classify the elbow movement well

Download Full-text

QoE Modeling on Split Features with Distributed Deep Learning

Network ◽

10.3390/network1020011 ◽

2021 ◽

Vol 1 (2) ◽

pp. 165-190

Author(s):

Selim Ickin ◽

Markus Fiedler ◽

Konstantinos Vandikas

Keyword(s):

Machine Learning ◽

Data Protection ◽

Machine Learning Techniques ◽

Accuracy Improvement ◽

Training Time ◽

General Data Protection Regulation ◽

Machine Learning Model ◽

Average Accuracy ◽

Significant Performance ◽

Perception Of Quality

The development of Quality of Experience (QoE) models using Machine Learning (ML) is challenging, since it can be difficult to share datasets between research entities to protect the intellectual property of the ML model and the confidentiality of user studies in compliance with data protection regulations such as General Data Protection Regulation (GDPR). This makes distributed machine learning techniques that do not necessitate sharing of data or attribute names appealing. One suitable use case in the scope of QoE can be the task of mapping QoE indicators for the perception of quality such as Mean Opinion Scores (MOS), in a distributed manner. In this article, we present Distributed Ensemble Learning (DEL), and Vertical Federated Learning (vFL) to address this context. Both approaches can be applied to datasets that have different feature sets, i.e., split features. The DEL approach is ML model-agnostic and achieves up to 12% accuracy improvement of ensembling various generic and specific models. The vFL approach is based on neural networks and achieves on-par accuracy with a conventional Fully Centralized machine learning model, while exhibiting statistically significant performance that is superior to that of the Isolated local models with an average accuracy improvement of 26%. Moreover, energy-efficient vFL with reduced network footprint and training time is obtained by further tuning the model hyper-parameters.

Download Full-text

Declarative data serving

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476302 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2555-2562

Author(s):

Ted Shaowang ◽

Nilesh Jain ◽

Dennis D. Matthews ◽

Sanjay Krishnan

Keyword(s):

Machine Learning ◽

Decision Making ◽

Computer Architecture ◽

Edge Computing ◽

Low Latency ◽

Complex Task ◽

Machine Learning Model ◽

Data Flows ◽

The Right ◽

High Level

Recent advances in computer architecture and networking have ushered in a new age of edge computing, where computation is placed close to the point of data collection to facilitate low-latency decision making. As the complexity of such deployments grow into networks of interconnected edge devices, getting the necessary data to be in "the right place at the right time" can become a challenge. We envision a future of edge analytics where data flows between edge nodes are declaratively configured through high-level constraints. Using machine learning model-serving as a prototypical task, we illustrate how the heterogeneity and specialization of edge devices can lead to complex, task-specific communication patterns even in relatively simple situations. Without a declarative framework, managing this complexity will be challenging for developers and will lead to brittle systems. We conclude with a research vision for database community that brings our perspective to the emergent area of edge computing.

Download Full-text