Frequent items mining on data stream using hash-table and heap

Author(s):  
Zhang Shan ◽  
Chen Ling ◽  
Tu Li
Author(s):  
Regant Y. S. Hung ◽  
Kwok Fai Lai ◽  
Hing Fung Ting
Keyword(s):  

2011 ◽  
Vol 130-134 ◽  
pp. 3702-3707
Author(s):  
Zhi Hua Chen ◽  
Jun Luo

According to the mobility and continuity of the flow of data streams,this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.


2019 ◽  
Vol 63 (3) ◽  
pp. 469-478
Author(s):  
Na Su ◽  
Shujuan Ji ◽  
Jimin Liu

Abstract Microblog is a popular social network in which hot topics propagate online rapidly. Real-time topic detection can not only understand public opinion well but also bring high commercial value. We design a method for real-time microblog data analysis in order to detect popular long lasting events as well as emerging events. Firstly, a mining frequent items algorithm on microblog data stream is proposed to count approximate word frequency. This mining frequent items algorithm can find the frequent words for some time. Secondly, the windows size of the monitored words is adjusted dynamically according to the duration time and the evolution of events. Lastly, new topics and trends of existing topics can be detected by using dynamic clustering algorithm based on vector space model. Experimental results show that the proposed algorithms can improve performance in terms of running time and accuracy.


2011 ◽  
Vol 130-134 ◽  
pp. 2661-2665 ◽  
Author(s):  
Qing Ling Mei ◽  
Ling Chen

A new algorithm to mine the frequent items in data stream is presented. The algorithm adopts a time fading factor to emphasize the importance of the relatively newer data, and records the densities of the data items in Hash tables. For a given threshold of density S and an integer k, our algorithm can mine the top k frequent items. Computation time for processing each data item is O(1) . Experimental results show that the algorithm outperforms other methods in terms of accuracy, memory requirement, and processing speed.


Sign in / Sign up

Export Citation Format

Share Document