An Efficient Closed Frequent Item Sets Mining Algorithm-For Mining Closed Frequent Item Sets from Data Streams

2016 ◽  
Vol 13 (10) ◽  
pp. 7467-7474
Author(s):  
Venu Madhav Kuthadi ◽  
Rajalakshmi Selvaraj

A data stream is a continuous sequence of data elements generated from a specified source. Mining frequent item sets in dynamic databases and data streams encounters some challenges that make the mining task harder than static databases. Many research works were developed in the frequent itemset mining, but these methods have the familiar problem of memory usage and processing time. Because, in data streams data elements are arrive at a rapid rate. The incoming data is unbounded and probably infinite. Due to high speed and large amount of incoming data, frequent item set mining algorithm must require a limited memory and processing time. To reduce this drawback in the existing method, a new algorithm is proposed in this paper. Here, a new algorithm is named as CFIM is developed for mining closed frequent item sets from the data streams based on their utility and consistency. During the closed frequent item sets mining, a hash table is maintained to check whether the given item set is closed or not. The computation of closed frequent item sets from the data stream will minimize the memory usage and processing time. Thus our proposed technique performance is analyzed by using the synthetic data set and compared with the exiting mining techniques.

2011 ◽  
Vol 130-134 ◽  
pp. 3702-3707
Author(s):  
Zhi Hua Chen ◽  
Jun Luo

According to the mobility and continuity of the flow of data streams,this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.


2013 ◽  
Vol 385-386 ◽  
pp. 1415-1418
Author(s):  
Yan Yang Guo ◽  
Gang Wang ◽  
Feng Mei Hou ◽  
Qing Ling Mei

In the paper the author introduces FCW_MRFI, which is a streaming data frequent item mining algorithm based on variable window. The FCW_MRFI algorithm can mine frequent item in any window of recent streaming data, whose given length is L. Meanwhile, it divides recent streaming data into several windows of variable length according to m, which is the number of the counter array. This algorithm can achieve smaller query error in recent windows, and can minimize the maximum query error in the whole recent streaming data.


2012 ◽  
Vol 263-266 ◽  
pp. 2179-2184 ◽  
Author(s):  
Zhen Yun Liao ◽  
Xiu Fen Fu ◽  
Ya Guang Wang

The first step of the association rule mining algorithm Apriori generate a lot of candidate item sets which are not frequent item sets, and all of these item sets cost a lot of system spending. To solve this problem,this paper presents an improved algorithm based on Apriori algorithm to improve the Apriori pruning step. Using this method, the large number of useless candidate item sets can be reduced effectively and it can also reduce the times of judge whether the item sets are frequent item sets. Experimental results show that the improved algorithm has better efficiency than classic Apriori algorithm.


2017 ◽  
Vol 8 (1) ◽  
pp. 31-43
Author(s):  
Zuber Shaikh ◽  
Antara Mohadikar ◽  
Rachana Nayak ◽  
Rohith Padamadan

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.


2012 ◽  
Vol 433-440 ◽  
pp. 4457-4462 ◽  
Author(s):  
Jun Shan Tan ◽  
Zhu Fang Kuang ◽  
Guo Gui Yang

The design of synopses structure is an important issue of frequent patterns mining over data stream. A data stream synopses structure FPD-Graph which is based on directed graph is proposed in this paper. The FPD-Graph contains list head node FPDG-Head and list node FPDG-Node. The operations of FPD-Graph consist of insert operation and deletion operation. A frequent pattern mining algorithm DGFPM based on sliding window over data stream is proposed in this paper. The IBM synthesizes data generation which output customers shopping a data are adopted as experiment data. The DGFPM algorithm not only has high precision for mining frequent patterns, but also has low processing time.


Author(s):  
Jia-Ling Koh ◽  
Shu-Ning Shin ◽  
Yuan-Bin Don

Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly.


2021 ◽  
Vol 15 (02) ◽  
pp. 33-41
Author(s):  
Wendy Osborn

In this paper, the problem of query processing in spatial data streams is explored, with a focus on the spatial join operation. Although the spatial join has been utilized in many proposed centralized and distributed query processing strategies, for its application to spatial data streams the spatial join operation has received very little attention. One identified limitation with existing strategies is that a bounded region of space (i.e., spatial extent) from which the spatial objects are generated needs to be known in advance. However, this information may not be available. Therefore, two strategies for spatial data stream join processing are proposed where the spatial extent of the spatial object stream is not required to be known in advance. Both strategies estimate the common region that is shared by two or more spatial data streams in order to process the spatial join. An evaluation of both strategies includes a comparison with a recently proposed approach in which the spatial extent of the data set is known. Experimental results show that one of the strategies performs very well at estimating the common region of space using only incoming objects on the spatial data streams. Other limitations of this work are also identified.


Sign in / Sign up

Export Citation Format

Share Document