scholarly journals Mining Closed Item sets from Tuple-Evolving Data Streams

Frequent Itemset Mining is playing major role in extracting useful knowledge from data streams that are exhibiting high data flow. Studies in data streams shows that every incoming data is considered as new tuple which is considered as revised tuple in some applications called as tuple evolving data streams. Extracting redundant less knowledge from such kind of application helps in better decision making with new challenges.One of the issue is, due to incoming revised tuple, some of the frequent itemsets may turn to infrequent or previously ignore itemsets may become frequent. Other issue is result of FIM may be huge and redundant results.In this paper, we address solution to the problem by finding closed itemsets from tuple revision data streams. We propose an efficient approach MCST that uses compressed SlideTree data structure to maintain stream data,proposeHIS hash tableto maintain itemsets, and CIS tables to maintain closed id sets to improve search performance of HIS.

2010 ◽  
Vol 44-47 ◽  
pp. 3159-3163
Author(s):  
Ke Ming Tang ◽  
Cai Yan Dai ◽  
Ling Chen

Mining closed frequent itemsets in data streams is an important task in stream data mining. Most of the traditional algorithms for mining closed frequent itemsets are Apriori-based which find the frequent itemsets from large amount of candidates, and needs a great deal of time and space. In this paper, an algorithm ItemListFCI for mining closed frequent itemsets in data stream is proposed. The algorithm is based on the sliding window model, and uses a ItemList where the transactions and itemsets are recorded by the column and row vectors respectively. The algorithm first builds the ItemList for the first sliding window. Frequent closed itemsets can be detected by pair-test operations on the binary numbers in the Table. After building the first ItemList, the algorithm updates the ItemList for each sliding window. The frequent closed itemsets in the sliding window can be identified from the ItemList. Algorithms are also proposed to modify ItemList when adding and deleting a transaction. The experimental results on synthetic and real data sets indicate that the proposed algorithm needs less CPU time and memory than other similar methods.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-30
Author(s):  
Guangtao Wang ◽  
Gao Cong ◽  
Ying Zhang ◽  
Zhen Hai ◽  
Jieping Ye

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.


2017 ◽  
Vol 8 (1) ◽  
pp. 31-43
Author(s):  
Zuber Shaikh ◽  
Antara Mohadikar ◽  
Rachana Nayak ◽  
Rohith Padamadan

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.


Author(s):  
Padmanathan Anantharaman ◽  
H.V. Ramakrishan

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.


2018 ◽  
Vol 112 ◽  
pp. 274-287 ◽  
Author(s):  
Haifeng Li ◽  
Ning Zhang ◽  
Jianming Zhu ◽  
Yue Wang ◽  
Huaihu Cao

2012 ◽  
Vol 256-259 ◽  
pp. 2910-2913
Author(s):  
Jun Tan

Online mining of frequent closed itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we proposed a novel sliding window based algorithm. The algorithm exploits lattice properties to limit the search to frequent close itemsets which share at least one item with the new transaction. Experiments results on synthetic datasets show that our proposed algorithm is both time and space efficient.


Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.


Sign in / Sign up

Export Citation Format

Share Document