Discovering Frequent Itemsets Reflected User Characteristics Using Weighted Batch based on Data Stream

According to the features of data streams and combined sliding window, a new algorithm A-MFI which is based on self-adjusting and orderly-compound policy for mining maximal frequent itemsets in data stream is proposed. This algorithm which is based on basic window updates information from data stream flow fragments and scans the stream only once to gain and store it in frequent itemsets list when the data stream flows. The core idea of this algorithm: construct self-adjusting and orderly-compound FP-tree, use mixed subset pruning techniques to reduce the search space, merge nodes which has equal minsup in the same branch and compress to generate the orderly-compound FP-tree to avoid superset checking when mining maximal frequent itemsets. The experimental results show that the algorithm has higher efficiency in time and space, and also has good scalability.

Download Full-text

Exploring Calendar-Based Pattern Mining in Data Streams

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch016 ◽

2010 ◽

pp. 342-360

Author(s):

Rodrigo Salvador Monteiro ◽

Geraldo Zimbrão ◽

Holger Schwarz ◽

Bernhard Mitschang ◽

Jano Moreira de Souza

Keyword(s):

Data Warehouse ◽

Data Streams ◽

Data Stream ◽

Pattern Mining ◽

A Priori ◽

Frequent Itemsets ◽

Detailed Data ◽

Series Of Experiments ◽

Working Day

Calendar-based pattern mining aims at identifying patterns on specific calendar partitions. Potential calendar partitions are for example: every Monday, every first working day of each month, every holiday. Providing flexible mining capabilities for calendar-based partitions is especially challenging in a data stream scenario. The calendar partitions of interest are not known a priori and at each point in time only a subset of the detailed data is available. The authors show how a data warehouse approach can be applied to this problem. The data warehouse that keeps track of frequent itemsets holding on different partitions of the original stream has low storage requirements. Nevertheless, it allows to derive sets of patterns that are complete and precise. Furthermore, the authors demonstrate the effectiveness of their approach by a series of experiments.

Download Full-text

Maximal Frequent Itemsets in Data Stream Mining Based on Orderly-Compound Policy

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.26-28.113 ◽

2010 ◽

Vol 26-28 ◽

pp. 113-117

Author(s):

Pei Shuai Chen ◽

Chong Huan Xu

Keyword(s):

Data Stream ◽

Frequent Itemsets ◽

Space Efficiency ◽

Time Space ◽

Algorithm Construct ◽

Closed Itemsets ◽

Maximal Frequent Itemsets ◽

Mining Frequent Itemsets ◽

Basic Window ◽

Pruning Technique

Mining maximal frequent itemsets get the advantage of a relatively small number of itemsets. Compared to mining frequent itemsets and mining frequent closed itemsets, such algorithm has higher time and space efficiency. According to the features of data streams and combined sliding window, a new algorithm E-FPMFI which is based on orderly-compound policy for mining maximal frequent itemsets in data stream is proposed. The algorithm based on basic window updates information from data stream flow fragment and scans the stream only once to gain and store it in frequent itemsets list. The algorithm construct FP-tree, then compress orderly FP-tree by merging nodes which has equal minsup in same branch, also uses subset mix pruning technique, avoid superset checking. The experimental results show the algorithm has higher time, space efficiency and good scalability.

Download Full-text

Maximal and closed frequent itemsets mining from uncertain database and data stream

International Journal of Data Science ◽

10.1504/ijds.2019.102792 ◽

2019 ◽

Vol 4 (3) ◽

pp. 237

Author(s):

Maliha Momtaz ◽

Abu Ahmed Ferdaus ◽

Chowdhury Farhan Ahmed ◽

Mohammad Samiullah

Keyword(s):

Data Stream ◽

Frequent Itemsets ◽

Closed Frequent Itemsets ◽

Frequent Itemsets Mining ◽

Uncertain Database

Download Full-text

An Approximate Approach for Maintaining Recent Occurrences of Itemsets in a Sliding Window over Data Streams

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch014 ◽

2010 ◽

pp. 308-327

Author(s):

Jia-Ling Koh ◽

Shu-Ning Shin ◽

Yuan-Bin Don

Keyword(s):

Data Streams ◽

Data Stream ◽

Traditional Approach ◽

Experimental Studies ◽

Dynamic Environment ◽

Sliding Window ◽

Fixed Time ◽

Frequent Itemsets ◽

Embedded Knowledge ◽

Data Elements

Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly.

Download Full-text