probabilistic topic modeling Latest Research Papers

Topic Modeling Using Latent Dirichlet allocation

ACM Computing Surveys ◽

10.1145/3462478 ◽

2022 ◽

Vol 54 (7) ◽

pp. 1-35

Author(s):

Uttam Chauhan ◽

Apurva Shah

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Research Work ◽

Topic Models ◽

Small Subset ◽

Distributed Environment ◽

Future Directions ◽

Probabilistic Topic Modeling ◽

Modeling Techniques ◽

Evaluation Techniques

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

Academic Topics Related to Household Energy Consumption Using the Future Sign Detection Technique

Energies ◽

10.3390/en14248446 ◽

2021 ◽

Vol 14 (24) ◽

pp. 8446

Author(s):

Minkyu Kim ◽

Chankook Park

Keyword(s):

Energy Consumption ◽

New Technologies ◽

Clean Energy ◽

Energy Performance ◽

Air Pollutant ◽

Pollutant Emissions ◽

Household Energy ◽

Household Energy Consumption ◽

Probabilistic Topic Modeling ◽

The Future

With the emergence of new technologies and policies to transition to clean energy, the household energy consumption sector is also changing. In response to policy, environmental, and technical changes, researchers need to find out what significant issues are related to household energy consumption, and comprehensively analyze which issues are likely to attract attention in the future to contribute to research in the household sector. Based on the abstracts of academic papers published between 2011 and 2020, this study uses probabilistic topic modeling to increase understanding of academic issues in the household energy consumption sector and statistically reviews changes in issues over time. As a result of the analysis, topics related to digitalization and renewable energy, such as microgrid system, smart home, residential solar power generation systems, and non-intrusive load monitoring (NILM), belonging to Strong signals, are being actively studied. Weak Signals, which can attract attention in the future, are included in discussions on coal energy consumption, air pollutant emissions, energy poverty, and energy performance evaluation. The analysis results show that carbon neutrality, such as decarbonization and fossil energy consumption reduction, is expanding to research in the household energy consumption sector.

Unearthing trends in environmental science and engineering research: Insights from a probabilistic topic modeling literature analysis

Journal of Cleaner Production ◽

10.1016/j.jclepro.2021.128322 ◽

2021 ◽

pp. 128322

Author(s):

Yazwand Palanichamy ◽

Mehdi Kargar ◽

Hossein Zolfagharinia

Keyword(s):

Topic Modeling ◽

Environmental Science ◽

Literature Analysis ◽

Science And Engineering ◽

Engineering Research ◽

Probabilistic Topic Modeling

A computational literature review of football performance analysis through probabilistic topic modeling

Artificial Intelligence Review ◽

10.1007/s10462-021-09998-8 ◽

2021 ◽

Author(s):

Vitor Ayres Principe ◽

Rodrigo Gomes de Souza Vale ◽

Juliana Brandão Pinto de Castro ◽

Luiz Marcelo Carvano ◽

Roberto André Pereira Henriques ◽

...

Keyword(s):

Performance Analysis ◽

Literature Review ◽

Topic Modeling ◽

Probabilistic Topic Modeling

A Study on the Characteristics of Academic Topics Related to Renewable Energy Using the Structural Topic Modeling and the Weak Signal Concept

Energies ◽

10.3390/en14051497 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1497

Author(s):

Chankook Park ◽

Minkyu Kim

Keyword(s):

Renewable Energy ◽

Energy Storage ◽

Topic Modeling ◽

Academic Research ◽

High Rate ◽

Weak Signals ◽

Energy Potential ◽

Rate Of Increase ◽

Probabilistic Topic Modeling ◽

To Receive

It is important to examine in detail how the distribution of academic research topics related to renewable energy is structured and which topics are likely to receive new attention in the future in order for scientists to contribute to the development of renewable energy. This study uses an advanced probabilistic topic modeling to statistically examine the temporal changes of renewable energy topics by using academic abstracts from 2010–2019 and explores the properties of the topics from the perspective of future signs such as weak signals. As a result, in strong signals, methods for optimally integrating renewable energy into the power grid are paid great attention. In weak signals, interest in large-capacity energy storage systems such as hydrogen, supercapacitors, and compressed air energy storage showed a high rate of increase. In not-strong-but-well-known signals, comprehensive topics have been included, such as renewable energy potential, barriers, and policies. The approach of this study is applicable not only to renewable energy but also to other subjects.

Knowledge discovery in research on domestic violence: an overview of the last fifty years

Data Technologies and Applications ◽

10.1108/dta-08-2020-0179 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Marcio Pereira Basilio ◽

Valdecy Pereira ◽

Max William Coelho Moreira de Oliveira ◽

Antonio Fernandes da Costa Neto ◽

Orlinda Claudia Rosa de Moraes ◽

...

Keyword(s):

Domestic Violence ◽

Interpersonal Violence ◽

Partner Violence ◽

Latent Dirichlet Allocation ◽

Practical Implication ◽

The United States ◽

Content Type ◽

Probabilistic Topic Modeling ◽

Starting Point ◽

The University

PurposeThe database of the Web of Science (WoS) was searched for publications from January 1945–May 7, 2020 on the topic of domestic violence in titles, abstracts and keywords. The references were analyzed using the R bibliometrix package, and abstracts were analyzed using latent Dirichlet allocation (LDA) with collapsed Gibbs sampling to obtain topics related to domestic violence.Design/methodology/approachThe aim of the study is to explore and provide an overview of research carried out on domestic violence, in its various aspects, over the past fifty years.FindingsAs a result of the research, the authors can assert that in the last fifty years, 32,298 authors have produced 19,495 documents on the theme of policing strategy and related subjects in 111 countries. Scientific production in this area grows at a rate of 12.81 per year. The United States of America is the leading country in publications with 48.14%, followed by the United Kingdom with 7.57% and Australia with 6.05%. Regarding universities, the highlight is the University of California with 664 publications, followed by the University of London with 515 and the University of North Carolina with 484. As for journals, the highlight is the Journal of Interpersonal Violence, Journal of Family Violence and Violence Against Women, which account for more than 14.32% of all indexed literature. Regarding the authors, the highlight is Campbell J.C and Feder G. Probabilistic topic modeling revealed that 18% of the topics concentrate 90% of all tokens. Topic 1 accounts for 27.9% of the sample and conducts research related to intimate partner violence.Practical implicationsAs a practical implication of using the LDA in the bibliographic review, we infer that its capacity to explore large masses of data allows the researcher to explore an infinitely greater amount than the traditional methods of systematic literature review.Originality/valueThe value of these studies is summarized in the presentation of an overview on the theme in the last fifty years, offering the opportunity for other researchers to use this research as a starting point for other analyses.

Industrial Federated Topic Modeling

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3418283 ◽

2021 ◽

Vol 12 (1) ◽

pp. 1-22

Author(s):

Di Jiang ◽

Yongxin Tong ◽

Yuanfeng Song ◽

Xueyang Wu ◽

Weiwei Zhao ◽

...

Keyword(s):

Topic Modeling ◽

Data Privacy ◽

Topic Models ◽

Real Life ◽

Industrial Applications ◽

High Quality ◽

Heterogeneous Model ◽

Data Scarcity ◽

Probabilistic Topic Modeling ◽

Training Topic

Probabilistic topic modeling has been applied in a variety of industrial applications. Training a high-quality model usually requires a massive amount of data to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence, training topic models in industrial scenarios using conventional approaches faces a dilemma: A party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this article, we propose a framework named Industrial Federated Topic Modeling (iFTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immunity to privacy adversaries. iFTM is inspired by federated learning, supports two representative topic models (i.e., Latent Dirichlet Allocation and SentenceLDA) in industrial applications, and consists of novel techniques such as private Metropolis-Hastings, topic-wise normalization, and heterogeneous model integration. We conduct quantitative evaluations to verify the effectiveness of iFTM and deploy iFTM in two real-life applications to demonstrate its utility. Experimental results verify iFTM’s superiority over conventional topic modeling.

Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling

Information ◽

10.3390/info11110518 ◽

2020 ◽

Vol 11 (11) ◽

pp. 518

Author(s):

Mubashar Mustafa ◽

Feng Zeng ◽

Hussain Ghulam ◽

Hafiz Muhammad Arslan

Keyword(s):

English Language ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Document Clustering ◽

Semantic Features ◽

Text Documents ◽

Proposed Model ◽

Probabilistic Topic Modeling ◽

Processing Techniques ◽

Dirichlet Allocation

Document clustering is to group documents according to certain semantic features. Topic model has a richer semantic structure and considerable potential for helping users to know document corpora. Unfortunately, this potential is stymied on text documents which have overlapping nature, due to their purely unsupervised nature. To solve this problem, some semi-supervised models have been proposed for English language. However, no such work is available for poor resource language Urdu. Therefore, document clustering has become a challenging task in Urdu language, which has its own morphology, syntax and semantics. In this study, we proposed a semi-supervised framework for Urdu documents clustering to deal with the Urdu morphology challenges. The proposed model is a combination of pre-processing techniques, seeded-LDA model and Gibbs sampling, we named it seeded-Urdu Latent Dirichlet Allocation (seeded-ULDA). We apply the proposed model and other methods to Urdu news datasets for categorizing. For the datasets, two conditions are considered for document clustering, one is “Dataset without overlapping” in which all classes have distinct nature. The other is “Dataset with overlapping” in which the categories are overlapping and the classes are connected to each other. The aim of this study is threefold: it first shows that unsupervised models (Latent Dirichlet Allocation (LDA), Non-negative matrix factorization (NMF) and K-means) are giving satisfying results on the dataset without overlapping. Second, it shows that these unsupervised models are not performing well on the dataset with overlapping, because, on this dataset, these algorithms find some topics that are neither entirely meaningful nor effective in extrinsic tasks. Third, our proposed semi-supervised model Seeded-ULDA performs well on both datasets because this model is straightforward and effective to instruct topic models to find topics of specific interest. It is shown in this paper that the semi-supervised model, Seeded-ULDA, provides significant results as compared to unsupervised algorithms.

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3369873 ◽

2020 ◽

Vol 14 (2) ◽

pp. 1-27

Author(s):

Ting Hua ◽

Chang-Tien Lu ◽

Jaegul Choo ◽

Chandan K. Reddy

Keyword(s):

Comparative Analysis ◽

Topic Modeling ◽

Document Collections ◽

Probabilistic Topic Modeling

An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models

Computational Linguistics ◽

10.1162/coli_a_00369 ◽

2020 ◽

Vol 46 (1) ◽

pp. 95-134

Author(s):

Shudong Hao ◽

Michael J. Paul

Keyword(s):

Knowledge Transfer ◽

Empirical Study ◽

Future Development ◽

Topic Modeling ◽

Topic Models ◽

Training Corpus ◽

Training Conditions ◽

Probabilistic Topic Models ◽

Probabilistic Topic Modeling ◽

Transfer Mechanisms

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.

probabilistic topic modeling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Topic Modeling Using Latent Dirichlet allocation

Academic Topics Related to Household Energy Consumption Using the Future Sign Detection Technique

Unearthing trends in environmental science and engineering research: Insights from a probabilistic topic modeling literature analysis

A computational literature review of football performance analysis through probabilistic topic modeling

A Study on the Characteristics of Academic Topics Related to Renewable Energy Using the Structural Topic Modeling and the Weak Signal Concept

Knowledge discovery in research on domestic violence: an overview of the last fifty years

Industrial Federated Topic Modeling

Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models

Export Citation Format

probabilistic topic modelingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Topic Modeling Using Latent Dirichlet allocation

Academic Topics Related to Household Energy Consumption Using the Future Sign Detection Technique

Unearthing trends in environmental science and engineering research: Insights from a probabilistic topic modeling literature analysis

A computational literature review of football performance analysis through probabilistic topic modeling

A Study on the Characteristics of Academic Topics Related to Renewable Energy Using the Structural Topic Modeling and the Weak Signal Concept

Knowledge discovery in research on domestic violence: an overview of the last fifty years

Industrial Federated Topic Modeling

Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models

probabilistic topic modeling
Recently Published Documents