Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.

Download Full-text

Short Text Clustering Algorithm Based on Frequent Closed Word Sets

2019 12th International Symposium on Computational Intelligence and Design (ISCID) ◽

10.1109/iscid.2019.10144 ◽

2019 ◽

Author(s):

Chunxia Jin ◽

Qiuchan Bai

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Short Text ◽

Short Text Clustering

Download Full-text

An Improved Method of Short Text Feature Extraction Based on Words Co-Occurrence

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.519-520.842 ◽

2014 ◽

Vol 519-520 ◽

pp. 842-845 ◽

Cited By ~ 1

Author(s):

Li Hong Wang

Keyword(s):

Feature Extraction ◽

Chinese Text ◽

Traditional Method ◽

Low Frequency ◽

Text Clustering ◽

Improved Method ◽

Short Text ◽

Text Feature ◽

Short Text Clustering

In Chinese text clustering, short text is very different from traditional long text, principally in the low frequency of words. As a result, traditional text feature extraction and the method for weight calculating is not directly suitable for short text clustering .To solve the problem of clustering drift in short text segments ,this paper proposes an method for feature extraction through improving the method of weight calculating based on words co-occurrence. Experiments show the method can get better performance in Chinese short-text clustering compared with the traditional method TF-IDF.

Download Full-text

Research on Hadoop-based massive short text clustering algorithm

Fourth International Workshop on Pattern Recognition ◽

10.1117/12.2540380 ◽

2019 ◽

Author(s):

qiang zhao ◽

Yuliang Shi ◽

Zepeng Qing

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Short Text ◽

Short Text Clustering

Download Full-text

Micro-blog Short Text Clustering Algorithm Based on Bootstrapping

2019 12th International Symposium on Computational Intelligence and Design (ISCID) ◽

10.1109/iscid.2019.10143 ◽

2019 ◽

Author(s):

Chunxia Jin ◽

Su Zhang

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Short Text ◽

Short Text Clustering

Download Full-text

A Scalable Short-Text Clustering Algorithm Using Apache Spark

10.1109/ictai52525.2021.00149 ◽

2021 ◽

Author(s):

Leonidas Akritidis ◽

Miltiadis Alamaniotis ◽

Athanasios Fevgas ◽

Panayiotis Bozanis

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Apache Spark ◽

Short Text ◽

Short Text Clustering

Download Full-text

BTM and GloVe Similarity Linear Fusion-Based Short Text Clustering Algorithm for Microblog Hot Topic Discovery

IEEE Access ◽

10.1109/access.2020.2973430 ◽

2020 ◽

Vol 8 ◽

pp. 32215-32225 ◽

Cited By ~ 2

Author(s):

Di Wu ◽

Mengtian Zhang ◽

Chao Shen ◽

Zhuyun Huang ◽

Mingxing Gu

Keyword(s):

Clustering Algorithm ◽

Text Clustering ◽

Short Text ◽

Topic Discovery ◽

Short Text Clustering

Download Full-text

Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00129 ◽

2020 ◽

Author(s):

Leonidas Akritidis ◽

Miltiadis Alamaniotis ◽

Athanasios Fevgas ◽

Panayiotis Bozanis

Keyword(s):

Feature Vector ◽

Text Clustering ◽

High Dimensionality ◽

Short Text ◽

Short Text Clustering

Download Full-text

Short-Text Clustering using Statistical Semantics

Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion ◽

10.1145/2740908.2742474 ◽

2015 ◽

Cited By ~ 12

Author(s):

Sepideh Seifzadeh ◽

Ahmed K. Farahat ◽

Mohamed S. Kamel ◽

Fakhri Karray

Keyword(s):

Text Clustering ◽

Short Text ◽

Short Text Clustering

Download Full-text

Short Text Clustering Algorithms for Weibo Topic Detection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.971-973.1747 ◽

2014 ◽

Vol 971-973 ◽

pp. 1747-1751 ◽

Cited By ~ 1

Author(s):

Lei Zhang ◽

Hai Qiang Chen ◽

Wei Jie Li ◽

Yan Zhao Liu ◽

Run Pu Wu

Keyword(s):

Text Analysis ◽

Semantic Information ◽

Clustering Algorithms ◽

Text Clustering ◽

Massive Data ◽

Topic Detection ◽

Clustering Methods ◽

Short Text ◽

Short Text Clustering ◽

Application Requirements

Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.

Download Full-text