Topic Detection from Short Text: A Term-based Consensus Clustering method

Objective: Sciatica pertains to neuropathic pain that has been associated with inflammatory response. We aimed to identify significant immune-related biomarkers for sciatica in peripheral blood.Methods: We utilized the GSE150408 expression profiling data from the Gene Expression Omnibus (GEO) database as the training dataset and extracted immune-related genes for further analysis. Differentially expressed immune-related genes (DEIRGs) between healthy controls and patients with sciatica were selected using the “limma” package and verified in clinical specimens by quantitative reverse transcription PCR (RT-qPCR). A diagnostic immune-related gene signature was established using the training model and random forest (RF), generalized linear model (GLM), and support vector machine (SVM) models. Sciatica patient subtypes were identified using the consensus clustering method.Results: Thirteen significant DEIRGs were acquired, of which five (CRP, EREG, FAM19A4, RLN1, and WFIKKN1) were selected to establish a diagnostic immune-related gene signature according to the most appropriate training model, namely, the RF model. A clinical application nomogram model was established based on the expression level of the five DEIRGs. The sciatica patients were divided into two subtypes (C1 and C2) according to the consensus clustering method.Conclusions: Our research established a diagnostic five immune-related gene signature to discriminate sciatica and identified two sciatica subtypes, which may be beneficial to the clinical diagnosis and treatment of sciatica.

Download Full-text

Short Text Clustering Algorithms for Weibo Topic Detection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.971-973.1747 ◽

2014 ◽

Vol 971-973 ◽

pp. 1747-1751 ◽

Cited By ~ 1

Author(s):

Lei Zhang ◽

Hai Qiang Chen ◽

Wei Jie Li ◽

Yan Zhao Liu ◽

Run Pu Wu

Keyword(s):

Text Analysis ◽

Semantic Information ◽

Clustering Algorithms ◽

Text Clustering ◽

Massive Data ◽

Topic Detection ◽

Clustering Methods ◽

Short Text ◽

Short Text Clustering ◽

Application Requirements

Text clustering is a popular research topic in the field of text mining, and now there are a lot of text clustering methods catering to different application requirements. Currently, Weibo data acquisition is through the API provided by big microblogging platforms. In this essay, we will discuss the algorithm of extracting popular topics posted by Weibo users by text clustering after massive data collection. Due to the fact that traditional text analysis may not be applicable to short texts used in Weibo, text clustering shall be carried out through combining multiple posts into long texts, based on their features (forwards, comments and followers, etc.). Either frequency-based or density-based short text clustering can deliver in most cases. The former is applicable to find hot topics from large Weibo short texts, and the latter is applicable to find abnormal contents. Both the two methods use semantic information to improve the accuracy of clustering. Besides, they improve the performance of clustering through the parallelism.

Download Full-text

Identification of a novel prognostic DNA methylation signature for lung adenocarcinoma based on consensus clustering method

Cancer Medicine ◽

10.1002/cam4.3343 ◽

2020 ◽

Vol 9 (20) ◽

pp. 7488-7502

Author(s):

Qidong Cai ◽

Boxue He ◽

Hui Xie ◽

Pengfei Zhang ◽

Xiong Peng ◽

...

Keyword(s):

Dna Methylation ◽

Lung Adenocarcinoma ◽

Consensus Clustering ◽

Clustering Method

Download Full-text

A Semantic-Based Short-Text Fast Clustering Method on Hotline Records in Chengdu

2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) ◽

10.1109/dasc/picom/cbdcom/cyberscitech.2019.00103 ◽

2019 ◽

Author(s):

Xiaorong Pu ◽

Kun Long ◽

Kecheng Chen ◽

Mei Xie ◽

Jiancheng Lv ◽

...

Keyword(s):

Clustering Method ◽

Short Text

Download Full-text

A word embedding topic model for topic detection and summary in social networks

Measurement and Control ◽

10.1177/0020294019865750 ◽

2019 ◽

Vol 52 (9-10) ◽

pp. 1289-1298 ◽

Cited By ~ 1

Author(s):

Lei Shi ◽

Gang Cheng ◽

Shang-ru Xie ◽

Gang Xie

Keyword(s):

Social Networks ◽

Social Network ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Topic Model ◽

Word Embedding ◽

Probabilistic Latent Semantic Analysis ◽

Topic Detection ◽

Short Text ◽

Internal Relationship

The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.

Download Full-text