Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

Postdiffset: an Eclat-like algorithm for frequent itemset mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.28.12911 ◽

2018 ◽

Vol 7 (2.28) ◽

pp. 197

Author(s):

W A.W.A. Bakar ◽

M A. Jalil ◽

M Man ◽

Z Abdullah ◽

F Mohd

Keyword(s):

Data Mining ◽

Association Rule ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Data Format ◽

Itemset Mining ◽

Data Formats ◽

Vertical Data ◽

Mining Algorithms

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns. Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining.

Download Full-text

Comparative Analysis on Frequent Itemset Mining Algorithms in Vertically Partitioned Cloud Data

10.1007/978-981-16-4625-6_38 ◽

2021 ◽

pp. 395-402

Author(s):

M. Yogasini ◽

B. N. Prathibha

Keyword(s):

Comparative Analysis ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Cloud Data ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

Frequent Itemset Mining Algorithms—A Literature Survey

10.1007/978-981-16-2422-3_13 ◽

2021 ◽

pp. 159-166

Author(s):

M. Sinthuja ◽

D. Evangeline ◽

S. Pravinth Raja ◽

G. Shanmugarathinam

Keyword(s):

Literature Survey ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

Security and Verification of Server Data Using Frequent Itemset Mining in Ecommerce

International Journal of Synthetic Emotions ◽

10.4018/ijse.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 31-43

Author(s):

Zuber Shaikh ◽

Antara Mohadikar ◽

Rachana Nayak ◽

Rohith Padamadan

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Graphical Password ◽

Itemset Mining ◽

Frequent Item ◽

Data Mining Algorithms ◽

Shoulder Surfing ◽

Mining Algorithms ◽

Frequent Item Sets

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.

Download Full-text

Apriori-based frequent itemset mining algorithms on MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC '12 ◽

10.1145/2184751.2184842 ◽

2012 ◽

Cited By ~ 107

Author(s):

Ming-Yen Lin ◽

Pei-Yu Lee ◽

Sue-Chen Hsueh

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.103 ◽

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

The Journal of Supercomputing ◽

10.1007/s11227-017-1963-4 ◽

2017 ◽

Vol 73 (8) ◽

pp. 3652-3668 ◽

Cited By ~ 24

Author(s):

Krishan Kumar Sethi ◽

Dharavath Ramesh

Keyword(s):

Big Data ◽

Data Processing ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Big Data Processing ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

Modern Applications and Challenges for Rare Itemset Mining

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.3.1037 ◽

2021 ◽

Vol 11 (3) ◽

pp. 208-218

Author(s):

Sadeq Darrab ◽

◽

David Broneske ◽

Gunter Saake

Keyword(s):

Data Mining ◽

Real Life ◽

The State ◽

Frequent Itemset ◽

Future Research ◽

Comprehensive Overview ◽

Itemset Mining ◽

Equipment Failures ◽

Mining Algorithms ◽

Rare Itemsets

Data mining is the process of extracting useful unknown knowledge from large datasets. Frequent itemset mining is the fundamental task of data mining that aims at discovering interesting itemsets that frequently appear together in a dataset. However, mining infrequent (rare) itemsets may be more interesting in many real-life applications such as predicting telecommunication equipment failures, genetics, medical diagnosis, or anomaly detection. In this paper, we survey up-to-date methods of rare itemset mining. The main goal of this survey is to provide a comprehensive overview of the state-of-the-art algorithms of rare itemset mining and its applications. The main contributions of this survey can be summarized as follows. In the first part, we define the task of rare itemset mining by explaining key concepts and terminology, motivation examples, and comparisons with underlying concepts. Then, we highlight the state-of-art methods for rare itemsets mining. Furthermore, we present variations of the task of rare itemset mining to discuss limitations of traditional rare itemset mining algorithms. After that, we highlight the fundamental applications of rare itemset mining. In the last, we point out research opportunities and challenges for rare itemset mining for future research.

Download Full-text