Topic discovery method based on topic model combined with hierarchical clustering

In order to discover the typical process route (TPR) in the Computer Aided Process Planning (CAPP) database, Hierarchical Clustering algorithm is adopted. A mathematics model based on the data matrix was built to describe the process route (PR). On the base of the operation code, the distance between operations, between the PRs, between clusters were measured to evaluate the PR similarity. Then, the PR clusters were eventually merged by the hierarchical clustering algorithm. Three methods are listed to confirm the clustering granularity that determines the clustering result. This TPR discovery method is successfully applied to discover the axle sleeves’ TPR.

Download Full-text

A Sparse Topic Model for Bursty Topic Discovery in Social Networks

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/15 ◽

2020 ◽

Vol 17 (5) ◽

pp. 816-824

Author(s):

Lei Shi ◽

Junping Du ◽

Feifei Kou

Keyword(s):

Topic Model ◽

State Of The Art ◽

Sina Weibo ◽

Short Text ◽

Qualitative And Quantitative ◽

Topic Discovery ◽

Spike And Slab Prior ◽

Automatic Discovery ◽

The Common ◽

Sparse Topic Model

Bursty topic discovery aims to automatically identify bursty events and continuously keep track of known events. The existing methods focus on the topic model. However, the sparsity of short text brings the challenge to the traditional topic models because the words are too few to learn from the original corpus. To tackle this problem, we propose a Sparse Topic Model (STM) for bursty topic discovery. First, we distinguish the modeling between the bursty topic and the common topic to detect the change of the words in time and discover the bursty words. Second, we introduce “Spike and Slab” prior to decouple the sparsity and smoothness of a distribution. The bursty words are leveraged to achieve automatic discovery of the bursty topics. Finally, to evaluate the effectiveness of our proposed algorithm, we collect Sina weibo dataset to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that the proposed STM algorithm outperforms favorably against several state-of-the-art methods

Download Full-text

Use of a Latent Topic Model for Characteristic Extraction from Health Checkup Questionnaire Data

Methods of Information in Medicine ◽

10.3414/me15-01-0023 ◽

2015 ◽

Vol 54 (06) ◽

pp. 515-521 ◽

Cited By ~ 4

Author(s):

I. Miyano ◽

H. Kataoka ◽

N. Nakajima ◽

T. Watabe ◽

N. Yasuda ◽

...

Keyword(s):

Hierarchical Clustering ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Probability Model ◽

Model Parameters ◽

Questionnaire Data ◽

Health Checkup ◽

Subjective Data ◽

Latent Topic ◽

Study Participants

Summary Objectives: When patients complete questionnaires during health checkups, many of their responses are subjective, making topic extraction difficult. Therefore, the purpose of this study was to develop a model capable of extracting appropriate topics from subjective data in questionnaires conducted during health checkups. Methods: We employed a latent topic model to group the lifestyle habits of the study participants and represented their responses to items on health checkup questionnaires as a probability model. For the probability model, we used latent Dirichlet allocation to extract 30 topics from the questionnaires. According to the model parameters, a total of 4381 study participants were then divided into groups based on these topics. Results from laboratory tests, including blood glucose level, triglycerides, and estimated glomerular filtration rate, were compared between each group, and these results were then compared with those obtained by hierarchical clustering. Results: If a significant (p < 0.05) difference was observed in any of the laboratory measurements between groups, it was considered to indicate a questionnaire response pattern corresponding to the value of the test result. A comparison between the latent topic model and hierarchical clustering grouping revealed that, in the latent topic model method, a small group of participants who reported having subjective signs of uri-nary disorder were allocated to a single group. Conclusions: The latent topic model is useful for extracting characteristics from a small number of groups from questionnaires with a large number of items. These results show that, in addition to chief complaints and history of past illness, questionnaire data obtained during medical checkups can serve as useful judgment criteria for assessing the conditions of patients.

Download Full-text

SenU-PTM: a novel phrase-based topic model for short-text topic discovery by exploiting word embeddings

Data Technologies and Applications ◽

10.1108/dta-02-2021-0039 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Heng-Yang Lu ◽

Yi Zhang ◽

Yuntao Du

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Model ◽

Second Phase ◽

Word Embeddings ◽

Two Phase ◽

Content Type ◽

Short Text ◽

Topic Discovery ◽

Two Phases ◽

Sense Unit

PurposeTopic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.Design/methodology/approachSenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.FindingsExperimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.Originality/valueThe originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Download Full-text

Classification aware neural topic model for COVID-19 disinformation categorisation

PLoS ONE ◽

10.1371/journal.pone.0247086 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247086

Author(s):

Xingyi Song ◽

Johann Petrak ◽

Ye Jiang ◽

Iknoor Singh ◽

Diana Maynard ◽

...

Keyword(s):

Public Health ◽

Topic Model ◽

Medical Science ◽

Policy Makers ◽

Media Type ◽

Topic Discovery ◽

Extensive Analysis ◽

Public Health Messages ◽

Fact Checking ◽

Effective Public Health

The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help tackle this, we developed computational methods to categorise COVID-19 disinformation. The COVID-19 disinformation categories could be used for a) focusing fact-checking efforts on the most damaging kinds of COVID-19 disinformation; b) guiding policy makers who are trying to deliver effective public health messages and counter effectively COVID-19 disinformation. This paper presents: 1) a corpus containing what is currently the largest available set of manually annotated COVID-19 disinformation categories; 2) a classification-aware neural topic model (CANTM) designed for COVID-19 disinformation category classification and topic discovery; 3) an extensive analysis of COVID-19 disinformation categories with respect to time, volume, false type, media type and origin source.

Download Full-text

Research on Hot Topic Discovery Technology of Micro-blog Based on Biterm Topic Model

Communications in Computer and Information Science - Geo-Spatial Knowledge and Intelligence ◽

10.1007/978-981-10-3969-0_27 ◽

2017 ◽

pp. 234-244

Author(s):

Jun Feng ◽

Yu Fang

Keyword(s):

Topic Model ◽

Topic Discovery

Download Full-text

Detect Text Topics by Semantics Graphs

10.5121/csit.2021.110806 ◽

2021 ◽

Author(s):

Alex Romanova

Keyword(s):

Deep Learning ◽

Image Classification ◽

Graph Model ◽

Word Embedding ◽

Topic Analysis ◽

Topic Discovery ◽

Model Finding ◽

As Graph ◽

Discovery Method ◽

Graph Capacity

It is beneficial for document topic analysis to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model, finding document topics and validating topic discovery. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents and uncover document topics as graph clusters. To validate topic discovery method we transfer words to vectors and vectors to images and use deep learning image classification.

Download Full-text

A short text topic discovery method for social network

Proceedings of the 33rd Chinese Control Conference ◽

10.1109/chicc.2014.6896676 ◽

2014 ◽

Cited By ~ 1

Author(s):

Jia Liu ◽

Qinglin Wang ◽

Yu Liu ◽

Yuan Li

Keyword(s):

Social Network ◽

Short Text ◽

Topic Discovery ◽

Discovery Method

Download Full-text

A text semantic topic discovery method based on the conditional co-occurrence degree

Neurocomputing ◽

10.1016/j.neucom.2019.08.047 ◽

2019 ◽

Vol 368 ◽

pp. 11-24 ◽

Cited By ~ 13

Author(s):

Wei Wei ◽

Chonghui Guo

Keyword(s):

Topic Discovery ◽

Discovery Method

Download Full-text

Semantics Graph Mining for Topic Discovery and Word Associations

International Journal of Data Mining & Knowledge Management Process ◽

10.5121/ijdkp.2021.11401 ◽

2021 ◽

Vol 11 (04) ◽

pp. 01-14

Author(s):

Alex Romanova

Keyword(s):

Data Mining ◽

Big Data ◽

Graph Mining ◽

Graph Model ◽

Word Embedding ◽

Text Data ◽

Word Associations ◽

Topic Discovery ◽

Discovery Method ◽

Graph Capacity

Big Data creates many challenges for data mining experts, in particular in getting meanings of text data. It is beneficial for text mining to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model to determine word associations and discover document topics. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents, get unexpected word associations and uncover document topics. To validate topic discovery method we transfer words to vectors and vectors to images and use CNN deep learning image classification.

Download Full-text