Addressing the issue of undeclared work – Part I: Applying associative classification per the CRISP-DM methodology

2021 ◽  
pp. 1-27
Author(s):  
Eleni Alogogianni ◽  
Maria Virvou

Addressing undeclared work is a high priority in the labor field for government policymakers since it adversely affects all involved parties and results in significant losses in tax and social security contribution revenues. In the last years, the wide use of ICT in labor inspectorates and the considerable progress in data exchange have resulted in numerous databases dispersed in various units, yet these are not effectively used to increase their functions productivity. This study presents a detailed analysis of a data mining project per the CRISP-DM methodology aiming to assist the labor inspectorates in dealing with undeclared work and other labor law violations. It uses real past inspections data merged with companies characteristics and their employment details and examines the application of two Associative Classification algorithms, the CBA and CBA2, in combination with two types of datasets, a binary and a four-class. The produced models are assessed per the data mining goals and per the initial business objectives, and the research concludes proposing an innovative inspections recommendation tool proved to offer two major benefits: a mechanism for planning targeted inspections of improved efficiency and a knowledge repository for enhancing the inspectors understanding of those features linked with labor law violations.

2018 ◽  
Vol 17 (04) ◽  
pp. 1850043
Author(s):  
Faisal Aburub ◽  
Wa’el Hadi

In this paper, we study the problem of predicting new locations of groundwater in Jordan through the application of a proposed new method, Groundwater Prediction using Associative Classification (GwPAC). We identify features that differentiate locations of groundwater wells according to whether or not they contain water. In addition, we survey intelligent-based methods related to groundwater exploration and management. Three experimental analyses were conducted with the objective to evaluate the capability of data mining algorithms using real groundwater data from the Ministry of Water and Irrigation. In the first experiment, we investigated the performance of GwPAC against three well-known associative classification algorithms, namely CBA, CMAR and FACA. Furthermore, three rule-based algorithms — C4.5, Random Forest and PBC4cip — were investigated in the second experiment; further, so as to generalise the capability of using data mining for solving the groundwater detection problem, four benchmark algorithms — SVMs, NB, KNN and ANNs — were evaluated in the third experiment. From all the experiments, the results indicated that all considered data mining algorithms predict locations of groundwater with acceptable classification rate (all classification accuracies [Formula: see text]%), and can be useful methods when seeking to address the problem of exploring new groundwater locations.


Plants ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 95
Author(s):  
Heba Kurdi ◽  
Amal Al-Aldawsari ◽  
Isra Al-Turaiki ◽  
Abdulrahman S. Aldawood

In the past 30 years, the red palm weevil (RPW), Rhynchophorus ferrugineus (Olivier), a pest that is highly destructive to all types of palms, has rapidly spread worldwide. However, detecting infestation with the RPW is highly challenging because symptoms are not visible until the death of the palm tree is inevitable. In addition, the use of automated RPW weevil identification tools to predict infestation is complicated by a lack of RPW datasets. In this study, we assessed the capability of 10 state-of-the-art data mining classification algorithms, Naive Bayes (NB), KSTAR, AdaBoost, bagging, PART, J48 Decision tree, multilayer perceptron (MLP), support vector machine (SVM), random forest, and logistic regression, to use plant-size and temperature measurements collected from individual trees to predict RPW infestation in its early stages before significant damage is caused to the tree. The performance of the classification algorithms was evaluated in terms of accuracy, precision, recall, and F-measure using a real RPW dataset. The experimental results showed that infestations with RPW can be predicted with an accuracy up to 93%, precision above 87%, recall equals 100%, and F-measure greater than 93% using data mining. Additionally, we found that temperature and circumference are the most important features for predicting RPW infestation. However, we strongly call for collecting and aggregating more RPW datasets to run more experiments to validate these results and provide more conclusive findings.


2021 ◽  
Author(s):  
Kek Zhi Xuan ◽  
Shuhaida Ismail ◽  
Intan Syazwani Noorain ◽  
Nur Aliaa Dalila A. Muhaime

2009 ◽  
Vol 36 (10) ◽  
pp. 2829-2839 ◽  
Author(s):  
Nikolaos Mastrogiannis ◽  
Basilis Boutsinas ◽  
Ioannis Giannikos

Author(s):  
Sam Fletcher ◽  
Md Zahidul Islam

The ability to extract knowledge from data has been the driving force of Data Mining since its inception, and of statistical modeling long before even that. Actionable knowledge often takes the form of patterns, where a set of antecedents can be used to infer a consequent. In this paper we offer a solution to the problem of comparing different sets of patterns. Our solution allows comparisons between sets of patterns that were derived from different techniques (such as different classification algorithms), or made from different samples of data (such as temporal data or data perturbed for privacy reasons). We propose using the Jaccard index to measure the similarity between sets of patterns by converting each pattern into a single element within the set. Our measure focuses on providing conceptual simplicity, computational simplicity, interpretability, and wide applicability. The results of this measure are compared to prediction accuracy in the context of a real-world data mining scenario.


2019 ◽  
Vol 8 (4) ◽  
pp. 1467-1469 ◽  

This paper is about to introduce a proposed system that examines growth or decay of the terrorist groups by the time, active locations, types of attack they carry out, motive targets, Weapon mastery and availability and many parameters to analyze the patterns and hidden structures in their activity and to predict the occasion and type of their future attack. We have done a detailed analysis of data we get from different sources and we also performed different classification algorithms on the available data to find the chances of probable attack on different regions.Based on results finding which of the algorithms works with highest accuracy.


Sign in / Sign up

Export Citation Format

Share Document