A Review of Scalable Algorithms for Frequent Itemset Mining for Big Data Using Hadoop and Spark

Author(s):  
Yassir Rochd ◽  
Imad Hafidi ◽  
Bajil Ouartassi
Author(s):  
Padmanathan Anantharaman ◽  
H.V. Ramakrishan

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.


2019 ◽  
Vol 163 ◽  
pp. 666-674 ◽  
Author(s):  
Carlos Fernandez-Basso ◽  
Abel J. Francisco-Agra ◽  
Maria J. Martin-Bautista ◽  
M. Dolores Ruiz

2021 ◽  
Author(s):  
Martha ◽  
Ramdas Vankdothu ◽  
Hameed Mohd Abdul ◽  
Rekha Gangula

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.


Frequent itemset mining is very crucial to minimize the cost and time of executions but when considering multiple distributed data streams in big data the frequent itemset mining has been a little cost consuming and taking more space and time complexity. In this paper we reduce the load and minimize the cost while minimizing the space and time complexities of the process by using reduction mechanism and indexing structures for preserving complexities. A 2-level architecture modal which will be helpful in handling the distributed data streams where the root node will be in level-0 and local nodes at level-1 is proposed. Each local node will evaluate the patterns in their specific data stream using the algorithm ‘FP’ which will help in lessening the burden on the root node and will be sent to root. With help of the patterns received from local nodes the root will generate a global pattern set.


Sign in / Sign up

Export Citation Format

Share Document