probabilistic topic modeling
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 18)

H-INDEX

10
(FIVE YEARS 3)

2022 ◽  
Vol 54 (7) ◽  
pp. 1-35
Author(s):  
Uttam Chauhan ◽  
Apurva Shah

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.


Energies ◽  
2021 ◽  
Vol 14 (24) ◽  
pp. 8446
Author(s):  
Minkyu Kim ◽  
Chankook Park

With the emergence of new technologies and policies to transition to clean energy, the household energy consumption sector is also changing. In response to policy, environmental, and technical changes, researchers need to find out what significant issues are related to household energy consumption, and comprehensively analyze which issues are likely to attract attention in the future to contribute to research in the household sector. Based on the abstracts of academic papers published between 2011 and 2020, this study uses probabilistic topic modeling to increase understanding of academic issues in the household energy consumption sector and statistically reviews changes in issues over time. As a result of the analysis, topics related to digitalization and renewable energy, such as microgrid system, smart home, residential solar power generation systems, and non-intrusive load monitoring (NILM), belonging to Strong signals, are being actively studied. Weak Signals, which can attract attention in the future, are included in discussions on coal energy consumption, air pollutant emissions, energy poverty, and energy performance evaluation. The analysis results show that carbon neutrality, such as decarbonization and fossil energy consumption reduction, is expanding to research in the household energy consumption sector.


Author(s):  
Vitor Ayres Principe ◽  
Rodrigo Gomes de Souza Vale ◽  
Juliana Brandão Pinto de Castro ◽  
Luiz Marcelo Carvano ◽  
Roberto André Pereira Henriques ◽  
...  

Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1497
Author(s):  
Chankook Park ◽  
Minkyu Kim

It is important to examine in detail how the distribution of academic research topics related to renewable energy is structured and which topics are likely to receive new attention in the future in order for scientists to contribute to the development of renewable energy. This study uses an advanced probabilistic topic modeling to statistically examine the temporal changes of renewable energy topics by using academic abstracts from 2010–2019 and explores the properties of the topics from the perspective of future signs such as weak signals. As a result, in strong signals, methods for optimally integrating renewable energy into the power grid are paid great attention. In weak signals, interest in large-capacity energy storage systems such as hydrogen, supercapacitors, and compressed air energy storage showed a high rate of increase. In not-strong-but-well-known signals, comprehensive topics have been included, such as renewable energy potential, barriers, and policies. The approach of this study is applicable not only to renewable energy but also to other subjects.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Marcio Pereira Basilio ◽  
Valdecy Pereira ◽  
Max William Coelho Moreira de Oliveira ◽  
Antonio Fernandes da Costa Neto ◽  
Orlinda Claudia Rosa de Moraes ◽  
...  

PurposeThe database of the Web of Science (WoS) was searched for publications from January 1945–May 7, 2020 on the topic of domestic violence in titles, abstracts and keywords. The references were analyzed using the R bibliometrix package, and abstracts were analyzed using latent Dirichlet allocation (LDA) with collapsed Gibbs sampling to obtain topics related to domestic violence.Design/methodology/approachThe aim of the study is to explore and provide an overview of research carried out on domestic violence, in its various aspects, over the past fifty years.FindingsAs a result of the research, the authors can assert that in the last fifty years, 32,298 authors have produced 19,495 documents on the theme of policing strategy and related subjects in 111 countries. Scientific production in this area grows at a rate of 12.81 per year. The United States of America is the leading country in publications with 48.14%, followed by the United Kingdom with 7.57% and Australia with 6.05%. Regarding universities, the highlight is the University of California with 664 publications, followed by the University of London with 515 and the University of North Carolina with 484. As for journals, the highlight is the Journal of Interpersonal Violence, Journal of Family Violence and Violence Against Women, which account for more than 14.32% of all indexed literature. Regarding the authors, the highlight is Campbell J.C and Feder G. Probabilistic topic modeling revealed that 18% of the topics concentrate 90% of all tokens. Topic 1 accounts for 27.9% of the sample and conducts research related to intimate partner violence.Practical implicationsAs a practical implication of using the LDA in the bibliographic review, we infer that its capacity to explore large masses of data allows the researcher to explore an infinitely greater amount than the traditional methods of systematic literature review.Originality/valueThe value of these studies is summarized in the presentation of an overview on the theme in the last fifty years, offering the opportunity for other researchers to use this research as a starting point for other analyses.


2021 ◽  
Vol 12 (1) ◽  
pp. 1-22
Author(s):  
Di Jiang ◽  
Yongxin Tong ◽  
Yuanfeng Song ◽  
Xueyang Wu ◽  
Weiwei Zhao ◽  
...  

Probabilistic topic modeling has been applied in a variety of industrial applications. Training a high-quality model usually requires a massive amount of data to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence, training topic models in industrial scenarios using conventional approaches faces a dilemma: A party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this article, we propose a framework named Industrial Federated Topic Modeling (iFTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immunity to privacy adversaries. iFTM is inspired by federated learning, supports two representative topic models (i.e., Latent Dirichlet Allocation and SentenceLDA) in industrial applications, and consists of novel techniques such as private Metropolis-Hastings, topic-wise normalization, and heterogeneous model integration. We conduct quantitative evaluations to verify the effectiveness of iFTM and deploy iFTM in two real-life applications to demonstrate its utility. Experimental results verify iFTM’s superiority over conventional topic modeling.


Information ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 518
Author(s):  
Mubashar Mustafa ◽  
Feng Zeng ◽  
Hussain Ghulam ◽  
Hafiz Muhammad Arslan

Document clustering is to group documents according to certain semantic features. Topic model has a richer semantic structure and considerable potential for helping users to know document corpora. Unfortunately, this potential is stymied on text documents which have overlapping nature, due to their purely unsupervised nature. To solve this problem, some semi-supervised models have been proposed for English language. However, no such work is available for poor resource language Urdu. Therefore, document clustering has become a challenging task in Urdu language, which has its own morphology, syntax and semantics. In this study, we proposed a semi-supervised framework for Urdu documents clustering to deal with the Urdu morphology challenges. The proposed model is a combination of pre-processing techniques, seeded-LDA model and Gibbs sampling, we named it seeded-Urdu Latent Dirichlet Allocation (seeded-ULDA). We apply the proposed model and other methods to Urdu news datasets for categorizing. For the datasets, two conditions are considered for document clustering, one is “Dataset without overlapping” in which all classes have distinct nature. The other is “Dataset with overlapping” in which the categories are overlapping and the classes are connected to each other. The aim of this study is threefold: it first shows that unsupervised models (Latent Dirichlet Allocation (LDA), Non-negative matrix factorization (NMF) and K-means) are giving satisfying results on the dataset without overlapping. Second, it shows that these unsupervised models are not performing well on the dataset with overlapping, because, on this dataset, these algorithms find some topics that are neither entirely meaningful nor effective in extrinsic tasks. Third, our proposed semi-supervised model Seeded-ULDA performs well on both datasets because this model is straightforward and effective to instruct topic models to find topics of specific interest. It is shown in this paper that the semi-supervised model, Seeded-ULDA, provides significant results as compared to unsupervised algorithms.


2020 ◽  
Vol 14 (2) ◽  
pp. 1-27
Author(s):  
Ting Hua ◽  
Chang-Tien Lu ◽  
Jaegul Choo ◽  
Chandan K. Reddy

2020 ◽  
Vol 46 (1) ◽  
pp. 95-134
Author(s):  
Shudong Hao ◽  
Michael J. Paul

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.


Sign in / Sign up

Export Citation Format

Share Document