A Review of Scalable Algorithms for Frequent Itemset Mining for Big Data Using Hadoop and Spark

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

The Journal of Supercomputing ◽

10.1007/s11227-017-1963-4 ◽

2017 ◽

Vol 73 (8) ◽

pp. 3652-3668 ◽

Cited By ~ 24

Author(s):

Krishan Kumar Sethi ◽

Dharavath Ramesh

Keyword(s):

Big Data ◽

Data Processing ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Big Data Processing ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

Finding tendencies in streaming data using Big Data frequent itemset mining

Knowledge-Based Systems ◽

10.1016/j.knosys.2018.09.026 ◽

2019 ◽

Vol 163 ◽

pp. 666-674 ◽

Cited By ~ 12

Author(s):

Carlos Fernandez-Basso ◽

Abel J. Francisco-Agra ◽

Maria J. Martin-Bautista ◽

M. Dolores Ruiz

Keyword(s):

Big Data ◽

Streaming Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining for Big Data

2013 IEEE International Conference on Big Data ◽

10.1109/bigdata.2013.6691742 ◽

2013 ◽

Cited By ~ 94

Author(s):

Sandy Moens ◽

Emin Aksehirli ◽

Bart Goethals

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Frequent itemset mining for Big data

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) ◽

10.1109/icgciot.2015.7380679 ◽

2015 ◽

Cited By ~ 5

Author(s):

Kiran Chavan ◽

Priyanka Kulkarni ◽

Pooja Ghodekar ◽

S.N. Patil

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( ◽

10.1109/icbda.2017.8078825 ◽

2017 ◽

Author(s):

Mohammad-Hossein Nadimi-Shahraki ◽

Mehdi Mansouri

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Apriori Algorithm ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Data Science Journal ◽

10.5334/dsj-2017-025 ◽

2017 ◽

Vol 16 ◽

Cited By ~ 1

Author(s):

Mohamed A. Gawwad ◽

Mona F. Ahmed ◽

Magda B. Fayek

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Greatest Common Divisor ◽

Itemset Mining

Download Full-text

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

10.21203/rs.3.rs-935690/v1 ◽

2021 ◽

Author(s):

Martha ◽

Ramdas Vankdothu ◽

Hameed Mohd Abdul ◽

Rekha Gangula

Keyword(s):

Data Mining ◽

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

New Paradigm ◽

Rule Mining ◽

Data Intensive ◽

Itemset Mining ◽

Real World Datasets ◽

Mining Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

Reduction of Frequent Itemsets Mining in Big Data with the Help of FP Algorithm and Msegt-Tree

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1666.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2169-2172

Keyword(s):

Big Data ◽

Data Streams ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Distributed Data ◽

Root Node ◽

Itemset Mining ◽

Space And Time ◽

Distributed Data Streams ◽

The Cost

Frequent itemset mining is very crucial to minimize the cost and time of executions but when considering multiple distributed data streams in big data the frequent itemset mining has been a little cost consuming and taking more space and time complexity. In this paper we reduce the load and minimize the cost while minimizing the space and time complexities of the process by using reduction mechanism and indexing structures for preserving complexities. A 2-level architecture modal which will be helpful in handling the distributed data streams where the root node will be in level-0 and local nodes at level-1 is proposed. Each local node will evaluate the patterns in their specific data stream using the algorithm ‘FP’ which will help in lessening the burden on the root node and will be sent to root. With help of the patterns received from local nodes the root will generate a global pattern set.

Download Full-text

Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework

2015 IEEE International Congress on Big Data ◽

10.1109/bigdatacongress.2015.65 ◽

2015 ◽

Cited By ~ 9

Author(s):

Yen-Hui Liang ◽

Shiow-Yang Wu

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Mapreduce Framework ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

A Review of Scalable Algorithms for Frequent Itemset Mining for Big Data Using Hadoop and Spark

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

Finding tendencies in streaming data using Big Data frequent itemset mining

Frequent Itemset Mining for Big Data

Frequent itemset mining for Big data

Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data

Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

Reduction of Frequent Itemsets Mining in Big Data with the Help of FP Algorithm and Msegt-Tree

Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework

Export Citation Format