Comparative Analysis on Frequent Itemset Mining Algorithms in Vertically Partitioned Cloud Data

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns. Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining.

Download Full-text

A performance based empirical study of the frequent itemset mining algorithms

2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI) ◽

10.1109/icpcsi.2017.8391988 ◽

2017 ◽

Author(s):

Ramah Sivakumar ◽

J.G.R. Sathiaseelan

Keyword(s):

Empirical Study ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms ◽

A Performance

Download Full-text

Paradigm and Performance Analysis of Distributed Frequent Itemset Mining Algorithms Based on MapReduce

Microprocessors and Microsystems ◽

10.1016/j.micpro.2020.103817 ◽

2021 ◽

pp. 103817

Author(s):

Wen Xiao ◽

Juan Hu

Keyword(s):

Performance Analysis ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

And Performance ◽

Mining Algorithms

Download Full-text

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

10.21203/rs.3.rs-935690/v1 ◽

2021 ◽

Author(s):

Martha ◽

Ramdas Vankdothu ◽

Hameed Mohd Abdul ◽

Rekha Gangula

Keyword(s):

Data Mining ◽

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

New Paradigm ◽

Rule Mining ◽

Data Intensive ◽

Itemset Mining ◽

Real World Datasets ◽

Mining Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

TIFIM: Tree based Incremental Frequent Itemset Mining over Streaming Data

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v10i5.4149 ◽

2013 ◽

Vol 10 (5) ◽

pp. 1580-1586

Author(s):

V.sidda Reddy ◽

Dr T.V. Rao ◽

Dr A. Govardhan

Keyword(s):

Data Streams ◽

Data Stream ◽

Streaming Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Proposed Model ◽

Mining Model ◽

Mining Algorithms ◽

Memory Efficient

Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.

Download Full-text

Survey on Frequent Itemset Mining Algorithms

International Journal of Computer Applications ◽

10.5120/316-484 ◽

2010 ◽

Vol 1 (15) ◽

pp. 94-100 ◽

Cited By ~ 4

Author(s):

Pramod S. Pramod ◽

O.P. Vyas

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

Mining Frequent Weighted Itemsets without Storing Transaction IDs and Generating Candidates

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517500052 ◽

2017 ◽

Vol 25 (01) ◽

pp. 111-144 ◽

Cited By ~ 20

Author(s):

Gangin Lee ◽

Unil Yun ◽

Keun Ho Ryu

Keyword(s):

State Of The Art ◽

Frequent Itemset ◽

Experimental Results ◽

Frequent Itemset Mining ◽

Memory Usage ◽

Tree Structures ◽

Prefix Tree ◽

Itemset Mining ◽

Mining Methods ◽

Mining Algorithms

Weighted itemset mining, which is one of the important areas in frequent itemset mining, is an approach for mining meaningful itemsets considering different importance or weights for each item in databases. Because of the merit of the weighted itemset mining, various related works have been studied actively. As one of the methods in the weighted itemset mining, FWI (Frequent Weighted Itemset) mining calculates weights of transactions from weights of items and then finds FWIs based on the transaction weights. However, previous FWI mining methods still have limitations in terms of runtime and memory usage performance. For this reason, in this paper, we propose two algorithms for mining FWIs more efficiently from databases with weights of items. In contrast to the previous approaches storing transaction IDs for mining FWIs, the proposed methods employ new types of prefix tree structures and mine these patterns more efficiently without storing any transaction ID. Through extensive experimental results in this paper, we show that the proposed algorithms outperform state-of-the-art FWI mining algorithms in terms of runtime, memory usage, and scalability.

Download Full-text

Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms

SoutheastCon 2016 ◽

10.1109/secon.2016.7506659 ◽

2016 ◽

Cited By ~ 19

Author(s):

Jeff Heaton

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text