Mining Closed Item sets from Tuple-Evolving Data Streams

Frequent Itemset Mining is playing major role in extracting useful knowledge from data streams that are exhibiting high data flow. Studies in data streams shows that every incoming data is considered as new tuple which is considered as revised tuple in some applications called as tuple evolving data streams. Extracting redundant less knowledge from such kind of application helps in better decision making with new challenges.One of the issue is, due to incoming revised tuple, some of the frequent itemsets may turn to infrequent or previously ignore itemsets may become frequent. Other issue is result of FIM may be huge and redundant results.In this paper, we address solution to the problem by finding closed itemsets from tuple revision data streams. We propose an efficient approach MCST that uses compressed SlideTree data structure to maintain stream data,proposeHIS hash tableto maintain itemsets, and CIS tables to maintain closed id sets to improve search performance of HIS.

Download Full-text

ItemListFCI：An Algorithm for Mining Closed Frequent Itemsets Based on Bit Table

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3159 ◽

2010 ◽

Vol 44-47 ◽

pp. 3159-3163

Author(s):

Ke Ming Tang ◽

Cai Yan Dai ◽

Ling Chen

Keyword(s):

Data Streams ◽

Sliding Window ◽

Real Data ◽

Frequent Itemsets ◽

Data Sets ◽

Stream Data ◽

Stream Data Mining ◽

Closed Frequent Itemsets ◽

Closed Itemsets ◽

Test Operations

Mining closed frequent itemsets in data streams is an important task in stream data mining. Most of the traditional algorithms for mining closed frequent itemsets are Apriori-based which find the frequent itemsets from large amount of candidates, and needs a great deal of time and space. In this paper, an algorithm ItemListFCI for mining closed frequent itemsets in data stream is proposed. The algorithm is based on the sliding window model, and uses a ItemList where the transactions and itemsets are recorded by the column and row vectors respectively. The algorithm first builds the ItemList for the first sliding window. Frequent closed itemsets can be detected by pair-test operations on the binary numbers in the Table. After building the first ItemList, the algorithm updates the ItemList for each sliding window. The frequent closed itemsets in the sliding window can be identified from the ItemList. Algorithms are also proposed to modify ItemList when adding and deleting a transaction. The experimental results on synthetic and real data sets indicate that the proposed algorithm needs less CPU time and memory than other similar methods.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

An Efficient Algorithm for Frequent Itemset Mining on Data Streams

Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining - Lecture Notes in Computer Science ◽

10.1007/11790853_37 ◽

2006 ◽

pp. 474-491 ◽

Cited By ~ 12

Author(s):

Xie Zhi-jun ◽

Chen Hong ◽

Cuiping Li

Keyword(s):

Data Streams ◽

Efficient Algorithm ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Security and Verification of Server Data Using Frequent Itemset Mining in Ecommerce

International Journal of Synthetic Emotions ◽

10.4018/ijse.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 31-43

Author(s):

Zuber Shaikh ◽

Antara Mohadikar ◽

Rachana Nayak ◽

Rohith Padamadan

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Graphical Password ◽

Itemset Mining ◽

Frequent Item ◽

Data Mining Algorithms ◽

Shoulder Surfing ◽

Mining Algorithms ◽

Frequent Item Sets

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.

Download Full-text

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.103 ◽

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

Probabilistic frequent itemset mining over uncertain data streams

Expert Systems with Applications ◽

10.1016/j.eswa.2018.06.042 ◽

2018 ◽

Vol 112 ◽

pp. 274-287 ◽

Cited By ~ 10

Author(s):

Haifeng Li ◽

Ning Zhang ◽

Jianming Zhu ◽

Yue Wang ◽

Huaihu Cao

Keyword(s):

Data Streams ◽

Uncertain Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Uncertain Data Streams

Download Full-text

A block-based approach for frequent itemset mining over data streams

2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) ◽

10.1109/fskd.2011.6019903 ◽

2011 ◽

Cited By ~ 1

Author(s):

Mina Memar ◽

Mohammad Hadi Sadreddini ◽

Mahmood Deypir ◽

Seyyed Mostafa Fakhrahmad

Keyword(s):

Data Streams ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Block Based

Download Full-text

Efficient Data Streams Based Closed Frequent Itemsets Mining Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.256-259.2910 ◽

2012 ◽

Vol 256-259 ◽

pp. 2910-2913

Author(s):

Jun Tan

Keyword(s):

Data Streams ◽

Sliding Window ◽

Frequent Itemsets ◽

Streaming Data ◽

Efficient Data ◽

Closed Itemsets ◽

Frequent Itemsets Mining ◽

Synthetic Datasets ◽

Online Mining ◽

Mining Data Streams

Online mining of frequent closed itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we proposed a novel sliding window based algorithm. The algorithm exploits lattice properties to limit the search to frequent close itemsets which share at least one item with the new transaction. Experiments results on synthetic datasets show that our proposed algorithm is both time and space efficient.

Download Full-text

MapReduce based frequent itemset mining algorithm on stream data

2015 Global Conference on Communication Technologies (GCCT) ◽

10.1109/gcct.2015.7342732 ◽

2015 ◽

Author(s):

Hemant Chaudhary ◽

Deepak Kumar Yadav ◽

Rajat Bhatnagar ◽

Uddagiri Chandrasekhar

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Stream Data ◽

Itemset Mining ◽

Mining Algorithm

Download Full-text

Partition based Single Scan Method for Mining Frequent Item Sets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9237.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4917-4922

Keyword(s):

Unique Feature ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Minimum Support ◽

Itemset Mining ◽

Highly Sensitive ◽

Support Threshold ◽

Hidden Patterns ◽

The Cost ◽

Frequent Item Sets

Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.

Download Full-text