A Proposed Model for Predicting Employee Turnover of Information Technology Specialists Using Data Mining Techniques

This article proposes a data mining framework to predict the significant explanations of employee turn-over problems. Using Support vector machine, decision tree, deep learning, random forest, and other classification algorithms, the authors propose features prediction framework to determine the influencing factors of employee turn-over problem. The proposed framework categorizes a set of historical behavior such as years at company, over time, performance rating, years since last promotion, and total working years. The proposed framework also classifies demographics features such as Age, Monthly Income, and Distance from Home, Marital Status, Education, and Gender. It also uses attitudinal employee characteristics to determine the reasons for employee turnover in the information technology sector. It has been found that the monthly rate, overtime, and employee age are the most significant factors which cause employee turnover.

Download Full-text

Early Detection of Red Palm Weevil, Rhynchophorus ferrugineus (Olivier), Infestation Using Data Mining

Plants ◽

10.3390/plants10010095 ◽

2021 ◽

Vol 10 (1) ◽

pp. 95

Author(s):

Heba Kurdi ◽

Amal Al-Aldawsari ◽

Isra Al-Turaiki ◽

Abdulrahman S. Aldawood

Keyword(s):

Data Mining ◽

Plant Size ◽

Support Vector ◽

Classification Algorithms ◽

Palm Tree ◽

Rhynchophorus Ferrugineus ◽

Red Palm Weevil ◽

Palm Weevil ◽

Using Data ◽

F Measure

In the past 30 years, the red palm weevil (RPW), Rhynchophorus ferrugineus (Olivier), a pest that is highly destructive to all types of palms, has rapidly spread worldwide. However, detecting infestation with the RPW is highly challenging because symptoms are not visible until the death of the palm tree is inevitable. In addition, the use of automated RPW weevil identification tools to predict infestation is complicated by a lack of RPW datasets. In this study, we assessed the capability of 10 state-of-the-art data mining classification algorithms, Naive Bayes (NB), KSTAR, AdaBoost, bagging, PART, J48 Decision tree, multilayer perceptron (MLP), support vector machine (SVM), random forest, and logistic regression, to use plant-size and temperature measurements collected from individual trees to predict RPW infestation in its early stages before significant damage is caused to the tree. The performance of the classification algorithms was evaluated in terms of accuracy, precision, recall, and F-measure using a real RPW dataset. The experimental results showed that infestations with RPW can be predicted with an accuracy up to 93%, precision above 87%, recall equals 100%, and F-measure greater than 93% using data mining. Additionally, we found that temperature and circumference are the most important features for predicting RPW infestation. However, we strongly call for collecting and aggregating more RPW datasets to run more experiments to validate these results and provide more conclusive findings.

Download Full-text

Osteoporosis Risk Prediction Using Data Mining Algorithms

Journal of Community Health Research ◽

10.18502/jchr.v9i2.3401 ◽

2020 ◽

Author(s):

Efat Jabarpour ◽

Amin Abedini ◽

Abbasali Keshtkar

Keyword(s):

Data Mining ◽

Personal Information ◽

Disease Diagnosis ◽

Support Vector ◽

Data Mining Algorithms ◽

Industry Standard ◽

Disease Information ◽

Increased Risk ◽

Using Data ◽

Mining Algorithms

Introduction: Osteoporosis is a disease that reduces bone density and loses the quality of bone microstructure leading to an increased risk of fractures. It is one of the major causes of inability and death in elderly people. The current study aims at determining the factors influencing the incidence of osteoporosis and providing a predictive model for the disease diagnosis to increase the diagnostic speed and reduce diagnostic costs. Methods: An Individual's data including personal information, lifestyle, and disease information were reviewed. A new model has been presented based on the Cross-Industry Standard Process CRISP methodology. Besides, Support Vector Machine (SVM) and Bayes methods (Tree Augmented Naïve Bayes (TAN)) and Clementine12 have been used as data mining tools. Results: Some features have been detected to affect this disease. The rules have been extracted that can be used as a pattern for the prediction of the patients' status. Classification precision was calculated to be 88.39% for SVM, and 91.29% for (TAN) when the precision of TAN is higher comparing to other methods. Conclusion: The most effective factors concerning osteoporosis are detected and can be used for a new sample with defined characteristics to predict the possibility of osteoporosis in a person.

Download Full-text

A Comparison of English Materials for Business Management and Information Technology Using Data Mining

10.9734/bpi/nvst/v3/1871c ◽

2021 ◽

pp. 131-143

Author(s):

Hiromi Ban ◽

Hidetaka Nambo ◽

Takashi Oyabu

Keyword(s):

Data Mining ◽

Information Technology ◽

Business Management ◽

Using Data

Download Full-text

Data-Driven Modelling of Smart Building Ventilation Subsystem

Journal of Sensors ◽

10.1155/2019/3572019 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Grigore Stamatescu ◽

Iulia Stamatescu ◽

Nicoleta Arghira ◽

Ioana Fagarasan

Keyword(s):

Data Mining ◽

Data Streams ◽

Data Driven ◽

Support Vector ◽

Commercial Building ◽

Monitoring And Control ◽

Smart Building ◽

Building Ventilation ◽

Using Data ◽

Rich Data

Considering the advances in building monitoring and control through networks of interconnected devices, effective handling of the associated rich data streams is becoming an important challenge. In many situations, the application of conventional system identification or approximate grey-box models, partly theoretic and partly data driven, is either unfeasible or unsuitable. The paper discusses and illustrates an application of black-box modelling achieved using data mining techniques with the purpose of smart building ventilation subsystem control. We present the implementation and evaluation of a data mining methodology on collected data from over one year of operation. The case study is carried out on four air handling units of a modern campus building for preliminary decision support for facility managers. The data processing and learning framework is based on two steps: raw data streams are compressed using the Symbolic Aggregate Approximation method, followed by the resulting segments being input into a Support Vector Machine algorithm. The results are useful for deriving the behaviour of each equipment in various modi of operation and can be built upon for fault detection or energy efficiency applications. Challenges related to online operation within a commercial Building Management System are also discussed as the approach shows promise for deployment.

Download Full-text

Email Worm Detection Using Data Mining

Techniques and Applications for Advanced Information Privacy and Security ◽

10.4018/978-1-60566-210-7.ch002 ◽

2011 ◽

pp. 20-34

Author(s):

Mohammad M. Masud ◽

Latifur Khan ◽

Bhavani Thuraisingham

Keyword(s):

Data Mining ◽

Feature Selection ◽

Principal Component ◽

Classification Model ◽

Support Vector ◽

Two Phase ◽

Feature Selection Technique ◽

Worm Detection ◽

Phase Selection ◽

Using Data

This chapter applies data mining techniques to detect email worms. Email messages contain a number of different features such as the total number of words in message body/subject, presence/absence of binary attachments, type of attachments, and so on. The goal is to obtain an efficient classification model based on these features. The solution consists of several steps. First, the number of features is reduced using two different approaches: feature-selection and dimension-reduction. This step is necessary to reduce noise and redundancy from the data. The feature-selection technique is called Two-phase Selection (TPS), which is a novel combination of decision tree and greedy selection algorithm. The dimensionreduction is performed by Principal Component Analysis. Second, the reduced data is used to train a classifier. Different classification techniques have been used, such as Support Vector Machine (SVM), Naïve Bayes and their combination. Finally, the trained classifiers are tested on a dataset containing both known and unknown types of worms. These results have been compared with published results. It is found that the proposed TPS selection along with SVM classification achieves the best accuracy in detecting both known and unknown types of worms.

Download Full-text

Prognosis of Diabetes Using Data mining Approach-Fuzzy C Means Clustering and Support Vector Machine

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v11p120 ◽

2014 ◽

Vol 11 (2) ◽

pp. 94-98 ◽

Cited By ~ 14

Author(s):

Ravi Sanakal ◽

◽

Smt. T Jayakumari

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Support Vector ◽

Fuzzy C Means ◽

Data Mining Approach ◽

Fuzzy C Means Clustering ◽

Using Data

Download Full-text

A Proposed Model for Predicting Employees’ Performance Using Data Mining Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9274 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3804-3809

Author(s):

A. Yovan Felix ◽

Karthik Reddy Vuyyuru ◽

Viswas Puli

Keyword(s):

Data Mining ◽

Resource Management ◽

Human Resource Management ◽

Human Resource ◽

Human Resource Professionals ◽

Quality Factors ◽

Proposed Model ◽

Opportune Time ◽

Using Data ◽

The Impact

Human Resource Management has gotten one of the basic pastimes of supervisors and chiefs in practically wide variety of corporations to include plans for accurately locating profoundly qualified representatives. In similar way, administrations come to be intrigued about the presentation of these representatives. Particularly to guarantee the fitting person apportioned to the beneficial employment on the opportune time. From right here the enthusiasm of statistics in mining process has been growing that its goal is disclosure of facts from huge measures of statistics. Three fundamental Data Mining strategies were applied for building the arrangement version and distinguishing the quality factors that emphatically impact the exhibition. To get a profoundly actual version, a few trials were achieved dependent on the beyond procedures which can be actualized in WEKA tool for empowering leaders and Human Resource professionals to anticipate and improve the exhibition of their representatives. This paper makes use of Hadoop for the remedy of great measure of data with which may be guaranteed to be able to decide the impact.

Download Full-text

Data mining, fuzzy AHP and TOPSIS for optimizing taxpayer supervision

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i1.pp75-87 ◽

2020 ◽

Vol 18 (1) ◽

pp. 75

Author(s):

M. Jupri ◽

Riyanarto Sarno

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Fuzzy Ahp ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Algorithms ◽

Using Data ◽

Time Required ◽

Mining Algorithms

The achievement of accepting optimal tax need effective and efficient tax supervision can be achieved by classifying taxpayer compliance to tax regulations. Considering this issue, this paper proposes the classification of taxpayer compliance using data mining algorithms; i.e. C4.5, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, and Multilayer Perceptron based on the compliance of taxpayer data. The taxpayer compliance can be classified into four classes, which are (1) formal and material compliant taxpayers, (2) formal compliant taxpayers, (3) material compliant taxpayers, and (4) formal and material non-compliant taxpayers. Furthermore, the results of data mining algorithms are compared by using Fuzzy AHP and TOPSIS to determine the best performance classification based on the criteria of Accuracy, F-Score, and Time required. Selection of the taxpayer's priority for more detailed supervision at each level of taxpayer compliance is ranked using Fuzzy AHP and TOPSIS based on criteria of dataset variables. The results show that C4.5 is the best performance classification and achieves preference value of 0.998; whereas the MLP algorithm results from the lowest preference value of 0.131. Alternative taxpayer A233 is the top priority taxpayer with a preference value of 0.433; whereas alternative taxpayer A051 is the lowest priority taxpayer with a preference value of 0.036.

Download Full-text

Fusion of Unobtrusive Sensing Solutions for Home-Based Activity Recognition and Classification Using Data Mining Models and Methods

Applied Sciences ◽

10.3390/app11199096 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9096

Author(s):

Idongesit Ekerete ◽

Matias Garcia-Constantino ◽

Alexandros Konios ◽

Mustafa A. Mustafa ◽

Yohanca Diaz-Skeete ◽

...

Keyword(s):

Data Mining ◽

Activity Recognition ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Home Environments ◽

Use Of Data ◽

Home Based ◽

Actual Recognition ◽

Using Data ◽

Unobtrusive Sensing

This paper proposes the fusion of Unobtrusive Sensing Solutions (USSs) for human Activity Recognition and Classification (ARC) in home environments. It also considers the use of data mining models and methods for cluster-based analysis of datasets obtained from the USSs. The ability to recognise and classify activities performed in home environments can help monitor health parameters in vulnerable individuals. This study addresses five principal concerns in ARC: (i) users’ privacy, (ii) wearability, (iii) data acquisition in a home environment, (iv) actual recognition of activities, and (v) classification of activities from single to multiple users. Timestamp information from contact sensors mounted at strategic locations in a kitchen environment helped obtain the time, location, and activity of 10 participants during the experiments. A total of 11,980 thermal blobs gleaned from privacy-friendly USSs such as ceiling and lateral thermal sensors were fused using data mining models and methods. Experimental results demonstrated cluster-based activity recognition, classification, and fusion of the datasets with an average regression coefficient of 0.95 for tested features and clusters. In addition, a pooled Mean accuracy of 96.5% was obtained using classification-by-clustering and statistical methods for models such as Neural Network, Support Vector Machine, K-Nearest Neighbour, and Stochastic Gradient Descent on Evaluation Test.

Download Full-text

Employing Data Mining Techniques for Predicting Opioid Withdrawal in Applicants of Health Centers

UHD Journal of Science and Technology ◽

10.21928/uhdjst.v3n2y2019.pp33-40 ◽

2019 ◽

Vol 3 (2) ◽

pp. 33

Author(s):

Raheleh Hamedanizad ◽

Elham Bahmani ◽

Mojtaba Jamshidi ◽

Aso Mohammad Darwesh

Keyword(s):

Data Mining ◽

Mean Squared Error ◽

Opioid Withdrawal ◽

Health Centers ◽

Statistical Population ◽

Data Mining Algorithms ◽

Proposed Model ◽

Meta Learning ◽

Using Data ◽

Mining Algorithms

Addiction to narcotics is one of the greatest health challenges in today’s world which has become a serious threat for social, economic, and cultural structures and has ruined a part of an active force of the society and it is one of the main factors of growth of diseases such as HIV and hepatitis. Today, addiction is known as a disease and welfare organization, and many of the dependent centers try to help the addicts treat this disease. In this study, using data mining algorithms and based on data collected from opioid withdrawal applicants referring to welfare organization, a prediction model is proposed to predict the success of opioid withdrawal applicants. In this study, the statistical population is comprised opioid withdrawal applicants in a welfare organization. This statistical population includes 26 features of 793 instances including men and women. The proposed model is a combination of meta-learning algorithms (decorate and bagging) and J48 decision tree implemented in Weka data mining software. The efficiency of the proposed model is evaluated in terms of precision, recall, Kappa, and root mean squared error and the results are compared with algorithms such as multilayer perceptron neural network, Naive Bayes, and Random Forest. The results of various experiments showed that the precision of the proposed model is 71.3% which is superior over the other compared algorithms.

Download Full-text