scholarly journals Probabilistic Topic Modeling for Comparative Analysis of Document Collections

2020 ◽  
Vol 14 (2) ◽  
pp. 1-27
Author(s):  
Ting Hua ◽  
Chang-Tien Lu ◽  
Jaegul Choo ◽  
Chandan K. Reddy
2018 ◽  
Vol 110 (1) ◽  
pp. 85-101 ◽  
Author(s):  
Ronald Cardenas ◽  
Kevin Bello ◽  
Alberto Coronado ◽  
Elizabeth Villota

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1497
Author(s):  
Chankook Park ◽  
Minkyu Kim

It is important to examine in detail how the distribution of academic research topics related to renewable energy is structured and which topics are likely to receive new attention in the future in order for scientists to contribute to the development of renewable energy. This study uses an advanced probabilistic topic modeling to statistically examine the temporal changes of renewable energy topics by using academic abstracts from 2010–2019 and explores the properties of the topics from the perspective of future signs such as weak signals. As a result, in strong signals, methods for optimally integrating renewable energy into the power grid are paid great attention. In weak signals, interest in large-capacity energy storage systems such as hydrogen, supercapacitors, and compressed air energy storage showed a high rate of increase. In not-strong-but-well-known signals, comprehensive topics have been included, such as renewable energy potential, barriers, and policies. The approach of this study is applicable not only to renewable energy but also to other subjects.


2018 ◽  
Vol 11 (4) ◽  
pp. 77 ◽  
Author(s):  
Malek Mouhoub ◽  
Mustakim Al Helal

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.


2015 ◽  
Vol 16 (S6) ◽  
Author(s):  
Massimo La Rosa ◽  
Antonino Fiannaca ◽  
Riccardo Rizzo ◽  
Alfonso Urso

Sign in / Sign up

Export Citation Format

Share Document