Topic Detection from Short Text: A Term-based Consensus Clustering method

Author(s):  
Hao Lin ◽  
Bo Sun ◽  
Junjie Wu ◽  
Haitao Xiong
IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 9225-9231 ◽  
Author(s):  
Chengde Zhang ◽  
Shaozhen Lu ◽  
Chengming Zhang ◽  
Xia Xiao ◽  
Qian Wang ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Xin Jin ◽  
Jun Wang ◽  
Lina Ge ◽  
Qing Hu

Objective: Sciatica pertains to neuropathic pain that has been associated with inflammatory response. We aimed to identify significant immune-related biomarkers for sciatica in peripheral blood.Methods: We utilized the GSE150408 expression profiling data from the Gene Expression Omnibus (GEO) database as the training dataset and extracted immune-related genes for further analysis. Differentially expressed immune-related genes (DEIRGs) between healthy controls and patients with sciatica were selected using the “limma” package and verified in clinical specimens by quantitative reverse transcription PCR (RT-qPCR). A diagnostic immune-related gene signature was established using the training model and random forest (RF), generalized linear model (GLM), and support vector machine (SVM) models. Sciatica patient subtypes were identified using the consensus clustering method.Results: Thirteen significant DEIRGs were acquired, of which five (CRP, EREG, FAM19A4, RLN1, and WFIKKN1) were selected to establish a diagnostic immune-related gene signature according to the most appropriate training model, namely, the RF model. A clinical application nomogram model was established based on the expression level of the five DEIRGs. The sciatica patients were divided into two subtypes (C1 and C2) according to the consensus clustering method.Conclusions: Our research established a diagnostic five immune-related gene signature to discriminate sciatica and identified two sciatica subtypes, which may be beneficial to the clinical diagnosis and treatment of sciatica.


2014 ◽  
Vol 971-973 ◽  
pp. 1747-1751 ◽  
Author(s):  
Lei Zhang ◽  
Hai Qiang Chen ◽  
Wei Jie Li ◽  
Yan Zhao Liu ◽  
Run Pu Wu

Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.


2020 ◽  
Vol 9 (20) ◽  
pp. 7488-7502
Author(s):  
Qidong Cai ◽  
Boxue He ◽  
Hui Xie ◽  
Pengfei Zhang ◽  
Xiong Peng ◽  
...  

2019 ◽  
Vol 52 (9-10) ◽  
pp. 1289-1298 ◽  
Author(s):  
Lei Shi ◽  
Gang Cheng ◽  
Shang-ru Xie ◽  
Gang Xie

The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.


Sign in / Sign up

Export Citation Format

Share Document