Predicting Job Satisfaction and Employee Turnover Using Machine Learning

Employee turn_over inflicts costs on the company. The employee must be supplanted, and the new employee trained. These quits may likewise make critical and exorbitant interruptions the production process. This gives lucid motivation to the firm to forestall stops or, in any event, to have the option to anticipate when and where stops can be anticipated. On the off chance that employees are approached to assess their superiors and the appropriate responses will be made accessible to the superior, it is most obvious that only positive feedbacks will be provided. Along these lines, the point is to utilize Machine Learning techniques to foresee employee turn_over. Appropriate predictions cause companies to take necessary decisions on employee retention or succession planning. Algorithms: One-Sample T-Test (T-Test), Decision Tree (DT), AdaBoost (AB), Logistic Regression (LR), Random Forest Classifier (RFC).

Download Full-text

Malicious URL Detection using Logistic Regression

10.36227/techrxiv.14790381 ◽

2021 ◽

Author(s):

Rohit Rayala ◽

Sashank Pasumarthi ◽

Rohith Kuppa ◽

S R KARTHIK

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Techniques ◽

Learning Techniques

Paper is based on a model that is built to detect malicious URLs using machine learning techniques.

Download Full-text

Machine Learning Techniques Applied to Profile Mobile Banking Users in India

International Journal of Information Systems in the Service Sector ◽

10.4018/jisss.2013010105 ◽

2013 ◽

Vol 5 (1) ◽

pp. 82-92 ◽

Cited By ~ 8

Author(s):

M. Carr ◽

V. Ravi ◽

G. Sridharan Reddy ◽

D. Veranna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Decision Trees ◽

Multilayer Perceptron ◽

Machine Learning Techniques ◽

Mobile Banking ◽

Classification Rules ◽

Learning Techniques ◽

Potential Customers

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text

Enhancing alpine glacial lakes detection and mapping using multi-source data and machine learning techniques

10.5194/egusphere-egu2020-21811 ◽

2020 ◽

Author(s):

Sonam Wangchuk ◽

Tobias Bolch

Keyword(s):

Machine Learning ◽

Random Forest ◽

Satellite Images ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Glacial Lake ◽

Glacial Lakes ◽

Alpine Regions ◽

Learning Techniques ◽

Source Data

<p>An accurate detection and mapping of glacial lakes in the Alpine regions such as the Himalayas, the Alps and the Andes are challenged by many factors. These factors include 1) a small size of glacial lakes, 2) cloud cover in optical satellite images, 3) cast shadows from mountains and clouds, 4) seasonal snow in satellite images, 5) varying degree of turbidity amongst glacial lakes, and 6) frozen glacial lake surface. In our study, we propose a fully automated approach, that overcomes most of the above mentioned challenges, to detect and map glacial lakes accurately using multi-source data and machine learning techniques such as the random forest classifier algorithm. The multi-source data are from the Sentinel-1 Synthetic Aperture Radar data (radar backscatter), the Sentinel-2 multispectral instrument data (NDWI), and the SRTM digital elevation model (slope). We use these data as inputs for the rule-based segmentation of potential glacial lakes, where decision rules are implemented from the expert system. The potential glacial lake polygons are then classified either as glacial lakes or non-glacial lakes by the trained and tested random forest classifier algorithm. The performance of the method was assessed in eight test sites located across the Alpine regions (e.g. the Boshula mountain range and Koshi basin in the Himalayas, the Tajiks Pamirs, the Swiss Alps and the Peruvian Andes) of the word. We show that the proposed method performs efficiently irrespective of geographic, geologic, climatic, and glacial lake conditions.</p>

Download Full-text

Machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database

10.1101/472555 ◽

2018 ◽

Cited By ~ 2

Author(s):

Sandip S Panesar ◽

Rhett N D’Souza ◽

Fang-Cheng Yeh ◽

Juan C Fernandez-Miranda

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Techniques ◽

World Health ◽

Support Vector ◽

Molecular Characteristics ◽

Regression Methods ◽

Learning Techniques ◽

The World ◽

Health Organization

AbstractBackgroundMachine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.MethodsWe applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).ResultsRaw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.ConclusionsWe demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.

Download Full-text

Applying Machine Learning Techniques for Performing Comparative Opinion Mining

Open Computer Science ◽

10.1515/comp-2020-0148 ◽

2020 ◽

Vol 10 (1) ◽

pp. 461-477

Author(s):

Umair Younis ◽

Muhammad Zubair Asghar ◽

Adil Khan ◽

Alamsher Khan ◽

Javed Iqbal ◽

...

Keyword(s):

Machine Learning ◽

Opinion Mining ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Product Reviews ◽

Business Organizations ◽

Machine Learning Classifiers ◽

Learning Techniques ◽

Improved Accuracy ◽

Comparative Opinion Mining

AbstractIn recent times, comparative opinion mining applications have attracted both individuals and business organizations to compare the strengths and weakness of products. Prior works on comparative opinion mining have focused on applying a single classifier, limited comparative opinion labels, and limited dataset of product reviews, resulting in degraded performance for classifying comparative reviews. In this work, we perform multi-class comparative opinion mining by applying multiple machine learning classifiers using an increased number of comparative opinion labels (9 classes) on 4 datasets of comparative product reviews. The experimental results show that Random Forest classifier has outperformed the comparing algorithms in terms of improved accuracy, precision, recall and f-measure.

Download Full-text

A comparative study of logistic regression based machine learning techniques for prediction of early virological suppression in antiretroviral initiating HIV patients

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-018-0659-x ◽

2018 ◽

Vol 18 (1) ◽

Cited By ~ 5

Author(s):

Kuteesa R. Bisaso ◽

Susan A. Karungi ◽

Agnes Kiragga ◽

Jackson K. Mukonzo ◽

Barbara Castelnuovo

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Comparative Study ◽

Virological Suppression ◽

Machine Learning Techniques ◽

Hiv Patients ◽

Learning Techniques

Download Full-text

Don’t Dismiss Logistic Regression: The Case for Sensible Extraction of Interactions in the Era of Machine Learning

10.1101/2019.12.15.877134 ◽

2019 ◽

Cited By ~ 1

Author(s):

Joshua J. Levy ◽

A. James O’Malley

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Model Building ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Statistical Machine Learning ◽

Forest Model ◽

Learning Techniques ◽

Modeling Techniques

AbstractBackgroundMachine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each.MethodsWe present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions.ResultsPreliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output.ConclusionsWhen a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.

Download Full-text

Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

Journal of Advances in Information Technology ◽

10.12720/jait.11.2.78-83 ◽

2020 ◽

pp. 78-83

Author(s):

Tahani Daghistani ◽

◽

Riyad Alshammari

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Aprendizado de Máquina Aplicado à Predição de Doenças Cardiometabólicas com Utilização de Indicadores Metabólicos e Comportamentais de Risco à Saúde

10.14210/cotb.v12.p301-308 ◽

2021 ◽

Author(s):

Alan Lopes de Sousa Freitas ◽

Ana Silvia Degasperi Ieker ◽

Josiane Melchiori Pinheiro ◽

Wilson Rinaldi ◽

Heloise Manica Paris Teixeira

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Decision Tree ◽

Causes Of Death ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cardiometabolic Diseases ◽

Learning Techniques ◽

Good Classification

Cardiometabolic diseases, developed throughout the worker’s life,such as hypertension, diabetes, dyslipidemia and obesity are amongthe main causes of death and are associated with modifiable andcontrollable risk factors. The general objective of this study wasto apply supervised Machine Learning techniques and to comparetheir performance to predict the risk of developing cardiometabolicdisease from servers working at the School Hospital of south inBrazil. We sought to map the characteristics of individuals who aremore likely to develop cardiometabolic diseases. The machine learningmodels evaluated were Naive Bayes, Decision Tree, RandomForest, KNN, Logistic Regression and SVM. The results obtained inthe experiments showed that some supervised machine learningmodels produce a good classification, depending on the attributesand hyperparameters used.

Download Full-text

The effects of applying filters on EEG signals for classifying developers’ code comprehension

Journal of Applied Research and Technology ◽

10.22201/icat.24486736e.2021.19.6.1299 ◽

2021 ◽

Vol 19 (6) ◽

pp. 584-602

Author(s):

Lucian Jose Gonçales ◽

Kleinner Farias ◽

Lucas Kupssinskü ◽

Matheus Segalotto

Keyword(s):

Machine Learning ◽

Software Engineering ◽

Random Forest ◽

Random Forest Classifier ◽

Machine Learning Techniques ◽

Eeg Signals ◽

Machine Learning Technique ◽

Learning Techniques ◽

Learning Technique ◽

F Measure

EEG signals are a relevant indicator for measuring aspects related to human factors in Software Engineering. EEG is used in software engineering to train machine learning techniques for a wide range of applications, including classifying task difficulty, and developers’ level of experience. The EEG signal contains noise such as abnormal readings, electrical interference, and eye movements, which are usually not of interest to the analysis, and therefore contribute to the lack of precision of the machine learning techniques. However, research in software engineering has not evidenced the effectiveness when applying these filters on EEG signals. The objective of this work is to analyze the effectiveness of filters on EEG signals in the software engineering context. As literature did not focus on the classification of developers’ code comprehension, this study focuses on the analysis of the effectiveness of applying EEG filters for training a machine learning technique to classify developers' code comprehension. A Random Forest (RF) machine learning technique was trained with filtered EEG signals to classify the developers' code comprehension. This study also trained another random forest classifier with unfiltered EEG data. Both models were trained using 10-fold cross-validation. This work measures the classifiers' effectiveness using the f-measure metric. This work used the t-test, Wilcoxon, and U Mann Whitney to analyze the difference in the effectiveness measures (f-measure) between the classifier trained with filtered EEG and the classifier trained with unfiltered EEG. The tests pointed out that there is a significant difference after applying EEG filters to classify developers' code comprehension with the random forest classifier. The conclusion is that the use of EEG filters significantly improves the effectivity to classify code comprehension using the random forest technique.

Download Full-text