Comparative Analysis on Frequent Itemset Mining Algorithms in Vertically Partitioned Cloud Data

2021 ◽  
pp. 395-402
Author(s):  
M. Yogasini ◽  
B. N. Prathibha
2021 ◽  
pp. 159-166
Author(s):  
M. Sinthuja ◽  
D. Evangeline ◽  
S. Pravinth Raja ◽  
G. Shanmugarathinam

2018 ◽  
Vol 7 (2.28) ◽  
pp. 197
Author(s):  
W A.W.A. Bakar ◽  
M A. Jalil ◽  
M Man ◽  
Z Abdullah ◽  
F Mohd

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns.  Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining. 


2021 ◽  
Author(s):  
Martha ◽  
Ramdas Vankdothu ◽  
Hameed Mohd Abdul ◽  
Rekha Gangula

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.


2013 ◽  
Vol 10 (5) ◽  
pp. 1580-1586
Author(s):  
V.sidda Reddy ◽  
Dr T.V. Rao ◽  
Dr A. Govardhan

Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.


Author(s):  
Gangin Lee ◽  
Unil Yun ◽  
Keun Ho Ryu

Weighted itemset mining, which is one of the important areas in frequent itemset mining, is an approach for mining meaningful itemsets considering different importance or weights for each item in databases. Because of the merit of the weighted itemset mining, various related works have been studied actively. As one of the methods in the weighted itemset mining, FWI (Frequent Weighted Itemset) mining calculates weights of transactions from weights of items and then finds FWIs based on the transaction weights. However, previous FWI mining methods still have limitations in terms of runtime and memory usage performance. For this reason, in this paper, we propose two algorithms for mining FWIs more efficiently from databases with weights of items. In contrast to the previous approaches storing transaction IDs for mining FWIs, the proposed methods employ new types of prefix tree structures and mine these patterns more efficiently without storing any transaction ID. Through extensive experimental results in this paper, we show that the proposed algorithms outperform state-of-the-art FWI mining algorithms in terms of runtime, memory usage, and scalability.


Sign in / Sign up

Export Citation Format

Share Document