Frequent items mining on data stream using hash-table and heap

According to the mobility and continuity of the flow of data streams，this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.

Download Full-text

Real-Time Topic Detection with Dynamic Windows

The Computer Journal ◽

10.1093/comjnl/bxz042 ◽

2019 ◽

Vol 63 (3) ◽

pp. 469-478

Author(s):

Na Su ◽

Shujuan Ji ◽

Jimin Liu

Keyword(s):

Data Analysis ◽

Real Time ◽

Data Stream ◽

Clustering Algorithm ◽

Vector Space Model ◽

Topic Detection ◽

Dynamic Clustering ◽

Improve Performance ◽

Space Model ◽

Frequent Items

Abstract Microblog is a popular social network in which hot topics propagate online rapidly. Real-time topic detection can not only understand public opinion well but also bring high commercial value. We design a method for real-time microblog data analysis in order to detect popular long lasting events as well as emerging events. Firstly, a mining frequent items algorithm on microblog data stream is proposed to count approximate word frequency. This mining frequent items algorithm can find the frequent words for some time. Secondly, the windows size of the monitored words is adjusted dynamically according to the duration time and the evolution of events. Lastly, new topics and trends of existing topics can be detected by using dynamic clustering algorithm based on vector space model. Experimental results show that the proposed algorithms can improve performance in terms of running time and accuracy.

Download Full-text

Efficient Algorithms with Time Fading Model for Mining Frequent Items over Data Stream

2009 International Conference on Industrial and Information Systems ◽

10.1109/iis.2009.48 ◽

2009 ◽

Author(s):

Li Tu ◽

Ling Chen ◽

Shan Zhang

Keyword(s):

Data Stream ◽

Efficient Algorithms ◽

Frequent Items

Download Full-text

An Algorithm for Mining Frequent Stream Data Items Using Hash Function and Fading Factor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.2661 ◽

2011 ◽

Vol 130-134 ◽

pp. 2661-2665 ◽

Cited By ~ 3

Author(s):

Qing Ling Mei ◽

Ling Chen

Keyword(s):

Processing Speed ◽

Hash Function ◽

Data Stream ◽

Computation Time ◽

Experimental Results ◽

Memory Requirement ◽

Stream Data ◽

Hash Tables ◽

Frequent Items ◽

Fading Factor

A new algorithm to mine the frequent items in data stream is presented. The algorithm adopts a time fading factor to emphasize the importance of the relatively newer data, and records the densities of the data items in Hash tables. For a given threshold of density S and an integer k, our algorithm can mine the top k frequent items. Computation time for processing each data item is O(1) . Experimental results show that the algorithm outperforms other methods in terms of accuracy, memory requirement, and processing speed.

Download Full-text