Data Stream Frequent Closed Item Sets Mining Based on Fast Sliding Window

2011 ◽  
Vol 130-134 ◽  
pp. 3702-3707
Author(s):  
Zhi Hua Chen ◽  
Jun Luo

According to the mobility and continuity of the flow of data streams,this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.

2019 ◽  
Vol 30 (3) ◽  
pp. 71-93
Author(s):  
Saubhik Paladhi ◽  
Sankhadeep Chatterjee ◽  
Takaaki Goto ◽  
Soumya Sen

Frequent item-set mining has been exhaustively studied in the last decade. Several successful approaches have been made to identify the maximal frequent item-sets from a set of typical item-sets. The present work has introduced a novel pruning mechanism which has proved itself to be significant time efficient. The novel technique is based on the Artificial Cell Division (ACD) algorithm which has been found to be highly successful in solving tasks that involve a multi-way search of the search space. The necessity conditions of the ACD process have been modified accordingly to tackle the pruning procedure. The proposed algorithm has been compared with the apriori algorithm implemented in WEKA. Accurate experimental evaluation has been conducted and the experimental results have proved the superiority of AFARTICA over apriori algorithm. The results have also indicated that the proposed algorithm can lead to better performance when the support threshold value is more for the same set of item-sets.


2016 ◽  
Vol 13 (10) ◽  
pp. 7467-7474
Author(s):  
Venu Madhav Kuthadi ◽  
Rajalakshmi Selvaraj

A data stream is a continuous sequence of data elements generated from a specified source. Mining frequent item sets in dynamic databases and data streams encounters some challenges that make the mining task harder than static databases. Many research works were developed in the frequent itemset mining, but these methods have the familiar problem of memory usage and processing time. Because, in data streams data elements are arrive at a rapid rate. The incoming data is unbounded and probably infinite. Due to high speed and large amount of incoming data, frequent item set mining algorithm must require a limited memory and processing time. To reduce this drawback in the existing method, a new algorithm is proposed in this paper. Here, a new algorithm is named as CFIM is developed for mining closed frequent item sets from the data streams based on their utility and consistency. During the closed frequent item sets mining, a hash table is maintained to check whether the given item set is closed or not. The computation of closed frequent item sets from the data stream will minimize the memory usage and processing time. Thus our proposed technique performance is analyzed by using the synthetic data set and compared with the exiting mining techniques.


2021 ◽  
Vol 16 ◽  
pp. 261-269
Author(s):  
Raja Azhan Syah Raja Wahab ◽  
Siti Nurulain Mohd Rum ◽  
Hamidah Ibrahim ◽  
Fatimah Sidi ◽  
Iskandar Ishak

The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.


Author(s):  
Jia-Ling Koh ◽  
Shu-Ning Shin ◽  
Yuan-Bin Don

Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly.


Author(s):  
Nibedita Panigrahi ◽  
P.K. Pattnaik ◽  
S.K. Padhi

Data mining is a part of know ledge Discovery in database process (KDD). As technology advances, floods of data can be produced and shared in many appliances such as wireless Sensor networks or Web click streams. This calls for extracting useful information and knowledge from streams of data. In this paper, We have proposed an efficient algorithm, where, at any time the current frequencies of all frequent item sets can be immediately produced. The current frequency of an item set in a stream is defined as its maximal frequency over all possible windows in the stream from any point in the past until the current state. The experimental result shows the proposed algorithm not only maintains a small summery of information for one item set but also consumes less memory then existing algorithms for mining frequent item sets over recent data streams.


2012 ◽  
Vol 461 ◽  
pp. 355-359
Author(s):  
Wen Chuan Yang ◽  
Ying Hua Song ◽  
Ting Xi Gou

Top-quality and efficient service increases in importance in the telecom service. One of its challenging issues is to deal with the atypical incidents. While the traditional mining algorithms are focus on the high-frequent item sets, a de-noising algorithm related to the atypical incidents still remains unsettled. This paper proposed a de-noising model based on the sliding window. In this model, FP-tree and multi-association rules are introduced to fix the thresholds of the sliding window. Experimental results demonstrate that the proposed algorithm can apply an appropriate data set to the knowledge discovery of the atypical incidents


2014 ◽  
Vol 602-605 ◽  
pp. 3268-3271
Author(s):  
Zhi Zhang ◽  
Qi Fu

In order to meet the uncertain data stream mining demand in large dynamic database, a frequent probability item mining algorithm was proposed base on sliding window. The mass data in the database was regarded as a data stream. In the window model of data stream, the frequent item set was extracted according to the probability frequency distribution information of data. Compared to the traditional algorithm, the mining environmental constraints of the certain data stream was overcome, the defect that the relevant information was easy to lose was improved. The true information of data was reflected fully, and the most accurate frequent item was minded. Simulation result shows that the new algorithm can mine the frequent items accurately, and the accuracy rate is higher than the traditional method. It can process the data quickly. It provides effective strategy for analyzing the large database, and it can meet the memory requirement and performance requirement in database analysis and mining.


Sign in / Sign up

Export Citation Format

Share Document