Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Author(s):  
Di Wu ◽  
Ruixin Yang ◽  
Chao Shen
2012 ◽  
Vol 532-533 ◽  
pp. 1716-1720 ◽  
Author(s):  
Chun Xia Jin ◽  
Hai Yan Zhou ◽  
Qiu Chan Bai

To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.


2014 ◽  
Vol 519-520 ◽  
pp. 842-845 ◽  
Author(s):  
Li Hong Wang

In Chinese text clustering, short text is very different from traditional long text, principally in the low frequency of words. As a result, traditional text feature extraction and the method for weight calculating is not directly suitable for short text clustering .To solve the problem of clustering drift in short text segments ,this paper proposes an method for feature extraction through improving the method of weight calculating based on words co-occurrence. Experiments show the method can get better performance in Chinese short-text clustering compared with the traditional method TF-IDF.


2021 ◽  
Author(s):  
Leonidas Akritidis ◽  
Miltiadis Alamaniotis ◽  
Athanasios Fevgas ◽  
Panayiotis Bozanis

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 32215-32225 ◽  
Author(s):  
Di Wu ◽  
Mengtian Zhang ◽  
Chao Shen ◽  
Zhuyun Huang ◽  
Mingxing Gu

2014 ◽  
Vol 971-973 ◽  
pp. 1747-1751 ◽  
Author(s):  
Lei Zhang ◽  
Hai Qiang Chen ◽  
Wei Jie Li ◽  
Yan Zhao Liu ◽  
Run Pu Wu

Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.


Sign in / Sign up

Export Citation Format

Share Document