Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

The Journal of Supercomputing ◽

10.1007/s11227-017-1963-4 ◽

2017 ◽

Vol 73 (8) ◽

pp. 3652-3668 ◽

Cited By ~ 24

Author(s):

Krishan Kumar Sethi ◽

Dharavath Ramesh

Keyword(s):

Big Data ◽

Data Processing ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Big Data Processing ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

Finding tendencies in streaming data using Big Data frequent itemset mining

Knowledge-Based Systems ◽

10.1016/j.knosys.2018.09.026 ◽

2019 ◽

Vol 163 ◽

pp. 666-674 ◽

Cited By ~ 12

Author(s):

Carlos Fernandez-Basso ◽

Abel J. Francisco-Agra ◽

Maria J. Martin-Bautista ◽

M. Dolores Ruiz

Keyword(s):

Big Data ◽

Streaming Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

An Efficient Method for Frequent Itemset Mining on Temporal Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953162 ◽

2019 ◽

pp. 558-568

Author(s):

Fathima Sherin T K ◽

Anish Kumar B.

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Edge Density ◽

Time Interval ◽

Related Data ◽

Itemset Mining ◽

A Value

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.

Download Full-text

Frequent Itemset Mining for Big Data

2013 IEEE International Conference on Big Data ◽

10.1109/bigdata.2013.6691742 ◽

2013 ◽

Cited By ~ 94

Author(s):

Sandy Moens ◽

Emin Aksehirli ◽

Bart Goethals

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Frequent itemset mining for Big data

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) ◽

10.1109/icgciot.2015.7380679 ◽

2015 ◽

Cited By ~ 5

Author(s):

Kiran Chavan ◽

Priyanka Kulkarni ◽

Pooja Ghodekar ◽

S.N. Patil

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( ◽

10.1109/icbda.2017.8078825 ◽

2017 ◽

Author(s):

Mohammad-Hossein Nadimi-Shahraki ◽

Mehdi Mansouri

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Apriori Algorithm ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Data Science Journal ◽

10.5334/dsj-2017-025 ◽

2017 ◽

Vol 16 ◽

Cited By ~ 1

Author(s):

Mohamed A. Gawwad ◽

Mona F. Ahmed ◽

Magda B. Fayek

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Greatest Common Divisor ◽

Itemset Mining

Download Full-text

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

10.21203/rs.3.rs-935690/v1 ◽

2021 ◽

Author(s):

Martha ◽

Ramdas Vankdothu ◽

Hameed Mohd Abdul ◽

Rekha Gangula

Keyword(s):

Data Mining ◽

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

New Paradigm ◽

Rule Mining ◽

Data Intensive ◽

Itemset Mining ◽

Real World Datasets ◽

Mining Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

Reduction of Frequent Itemsets Mining in Big Data with the Help of FP Algorithm and Msegt-Tree

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1666.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2169-2172

Keyword(s):

Big Data ◽

Data Streams ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Distributed Data ◽

Root Node ◽

Itemset Mining ◽

Space And Time ◽

Distributed Data Streams ◽

The Cost

Frequent itemset mining is very crucial to minimize the cost and time of executions but when considering multiple distributed data streams in big data the frequent itemset mining has been a little cost consuming and taking more space and time complexity. In this paper we reduce the load and minimize the cost while minimizing the space and time complexities of the process by using reduction mechanism and indexing structures for preserving complexities. A 2-level architecture modal which will be helpful in handling the distributed data streams where the root node will be in level-0 and local nodes at level-1 is proposed. Each local node will evaluate the patterns in their specific data stream using the algorithm ‘FP’ which will help in lessening the burden on the root node and will be sent to root. With help of the patterns received from local nodes the root will generate a global pattern set.

Download Full-text