stream clustering
Recently Published Documents


TOTAL DOCUMENTS

243
(FIVE YEARS 66)

H-INDEX

18
(FIVE YEARS 3)

2022 ◽  
Vol 70 (1) ◽  
pp. 1263-1280
Author(s):  
M. Arun Manicka Raja ◽  
S. Swamynathan

2021 ◽  
Author(s):  
Christian Nordahl ◽  
Veselka Boeva ◽  
Håkan Grahn ◽  
Marie Persson Netz

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.


2021 ◽  
Vol 25 (5) ◽  
pp. 1211-1231
Author(s):  
Tham Vo ◽  
Phuc Do

Recently, rapid growth of social networks and online news resources from Internet have made text stream clustering become an insufficient application in multiple domains (e.g.: text retrieval diversification, social event detection, text summarization, etc.) Different from traditional static text clustering approach, text stream clustering task has specific key challenges related to the rapid change of topics/clusters and high-velocity of coming streaming document batches. Recent well-known model-based text stream clustering models, such as: DTM, DCT, MStream, etc. are considered as word-independent evaluation approach which means largely ignoring the relations between words while sampling clusters/topics. It definitely leads to the decrease of overall model accuracy performance, especially for short-length text documents such as comments, microblogs, etc. in social networks. To tackle these existing problems, in this paper we propose a novel approach of graph-of-words (GOWs) based text stream clustering, called GOW-Stream. The application of common GOWs which are generated from each document batch while sampling clusters/topics can support to overcome the word-independent evaluation challenge. Our proposed GOW-Stream is promising to significantly achieve better text stream clustering performance than recent state-of-the-art baselines. Extensive experiments on multiple benchmark real-world datasets demonstrate the effectiveness of our proposed model in both accuracy and time-consuming performances.


2021 ◽  
Author(s):  
Arun Manicka Raja M ◽  
Swamynathan Sankaranarayanan

Abstract Stream analytics, a new paradigm in data analytics, has gained mo- mentum due to the voluminous stream data generation. With the huge increase in the edits performed on Wikipedia topics, it is tedious for the digital knowledge discovery users to nd their domain updates immediately. The users need to go through large information and spend more time to nd the potential data. There is a need for retrieving the Wikipedia edits based on the meta data of the article edits for later retriev-al. Hence, the clustering technique may be employed in order to group the Wikipedia article edits domain wise. Hence, in this paper, hierarchi- cal stream clustering is applied in order to retrieve the edits based on the user interest. Over a period of month, the data from Wikipedia is collected and used as a dataset. Our method is compared with the state-of-the-art clus-tering system WikiAutoCat and it is observed that the accuracy is improved by 10% and the clustering time is reduced by 20%.


2021 ◽  
Vol 1955 (1) ◽  
pp. 012048
Author(s):  
Chunhua Yang ◽  
Cong Wang ◽  
Xiao Hu ◽  
Niankang You ◽  
Xuguang Yang

Sign in / Sign up

Export Citation Format

Share Document