SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

Wen Xiao; Juan Hu

doi:10.1007/s11227-020-03190-5

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

The Journal of Supercomputing ◽

10.1007/s11227-020-03190-5 ◽

2020 ◽

Vol 76 (10) ◽

pp. 7619-7634 ◽

Cited By ~ 2

Author(s):

Wen Xiao ◽

Juan Hu

Keyword(s):

Data Mining ◽

Data Processing ◽

Sliding Window ◽

Frequent Itemsets ◽

Streaming Data ◽

Frequent Itemset ◽

Apache Spark ◽

Itemset Mining ◽

Mining Algorithm ◽

Vertical Data

Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.

Download Full-text

Security and Verification of Server Data Using Frequent Itemset Mining in Ecommerce

International Journal of Synthetic Emotions ◽

10.4018/ijse.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 31-43

Author(s):

Zuber Shaikh ◽

Antara Mohadikar ◽

Rachana Nayak ◽

Rohith Padamadan

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Graphical Password ◽

Itemset Mining ◽

Frequent Item ◽

Data Mining Algorithms ◽

Shoulder Surfing ◽

Mining Algorithms ◽

Frequent Item Sets

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.

Download Full-text

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

The Journal of Supercomputing ◽

10.1007/s11227-017-1963-4 ◽

2017 ◽

Vol 73 (8) ◽

pp. 3652-3668 ◽

Cited By ~ 24

Author(s):

Krishan Kumar Sethi ◽

Dharavath Ramesh

Keyword(s):

Big Data ◽

Data Processing ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Big Data Processing ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream

International Journal of Computer Applications ◽

10.5120/ijca2020920670 ◽

2020 ◽

Vol 175 (16) ◽

pp. 48-55

Author(s):

Zakria Mahrousa ◽

Dima Mufti Alchawafa ◽

Hasan Kazzaz

Keyword(s):

Data Stream ◽

Sliding Window ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithm ◽

Dynamic Sliding Window

Download Full-text

Postdiffset: an Eclat-like algorithm for frequent itemset mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.28.12911 ◽

2018 ◽

Vol 7 (2.28) ◽

pp. 197

Author(s):

W A.W.A. Bakar ◽

M A. Jalil ◽

M Man ◽

Z Abdullah ◽

F Mohd

Keyword(s):

Data Mining ◽

Association Rule ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Data Format ◽

Itemset Mining ◽

Data Formats ◽

Vertical Data ◽

Mining Algorithms

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns. Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining.

Download Full-text

An Efficient Method for Frequent Itemset Mining on Temporal Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953162 ◽

2019 ◽

pp. 558-568

Author(s):

Fathima Sherin T K ◽

Anish Kumar B.

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Edge Density ◽

Time Interval ◽

Related Data ◽

Itemset Mining ◽

A Value

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

A new closed frequent itemset mining algorithm based on GPU and improved vertical structure

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.3904 ◽

2016 ◽

Vol 29 (6) ◽

pp. e3904 ◽

Cited By ~ 6

Author(s):

Yun Li ◽

Jie Xu ◽

Yun-Hao Yuan ◽

Ling Chen

Keyword(s):

Vertical Structure ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithm ◽

Closed Frequent Itemset

Download Full-text

Using apache spark to collect analytic from the streaming data processing application logs

2018 7th Mediterranean Conference on Embedded Computing (MECO) ◽

10.1109/meco.2018.8406048 ◽

2018 ◽

Author(s):

Golovanov Mikhail Evgenyevich ◽

Bakulev Aleksandr Valerievich ◽

Bakuleva Marina Alekseevna

Keyword(s):

Data Processing ◽

Streaming Data ◽

Apache Spark ◽

Processing Application

Download Full-text

A False Negative Maximal Frequent Itemset Mining Algorithm over Stream

Advanced Data Mining and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-25853-4_3 ◽

2011 ◽

pp. 29-41 ◽

Cited By ~ 2

Author(s):

Haifeng Li ◽

Ning Zhang

Keyword(s):

False Negative ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

Data-based affinity analysis of power transformer defects with adaptive frequent itemset mining algorithm

2017 3rd IEEE International Conference on Computer and Communications (ICCC) ◽

10.1109/compcomm.2017.8323052 ◽

2017 ◽

Author(s):

Z. W. Zhang ◽

W. S. Gao ◽

W. X. Mo ◽

H. B. Wang ◽

L. Luan

Keyword(s):

Power Transformer ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text