An Empirical Study on Crosslingual Transfer in Probabilistic Topic                     Models

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.

Download Full-text

Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw136 ◽

2016 ◽

Vol 24 (3) ◽

pp. 472-480 ◽

Cited By ~ 13

Author(s):

Jonathan H Chen ◽

Mary K Goldstein ◽

Steven M Asch ◽

Lester Mackey ◽

Russ B Altman

Keyword(s):

Decision Support ◽

Topic Modeling ◽

Operating Characteristic ◽

Topic Model ◽

Characteristic Curve ◽

Topic Models ◽

Probabilistic Topic Models ◽

Probabilistic Topic Modeling ◽

Order Sets ◽

Operating Characteristic Curve

Objective: Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets. Materials and Methods: The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders. Results: Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% (P < 10−20) by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., “critical care,” “pneumonia,” “neurologic evaluation”). Discussion: Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability. Conclusion: Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support.

Download Full-text

Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA

Information Retrieval ◽

10.1007/s10791-010-9141-9 ◽

2010 ◽

Vol 14 (2) ◽

pp. 178-203 ◽

Cited By ~ 114

Author(s):

Yue Lu ◽

Qiaozhu Mei ◽

ChengXiang Zhai

Keyword(s):

Empirical Study ◽

Task Performance ◽

Topic Models ◽

Probabilistic Topic Models

Download Full-text

Industrial Federated Topic Modeling

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3418283 ◽

2021 ◽

Vol 12 (1) ◽

pp. 1-22

Author(s):

Di Jiang ◽

Yongxin Tong ◽

Yuanfeng Song ◽

Xueyang Wu ◽

Weiwei Zhao ◽

...

Keyword(s):

Topic Modeling ◽

Data Privacy ◽

Topic Models ◽

Real Life ◽

Industrial Applications ◽

High Quality ◽

Heterogeneous Model ◽

Data Scarcity ◽

Probabilistic Topic Modeling ◽

Training Topic

Probabilistic topic modeling has been applied in a variety of industrial applications. Training a high-quality model usually requires a massive amount of data to provide comprehensive co-occurrence information for the model to learn. However, industrial data such as medical or financial records are often proprietary or sensitive, which precludes uploading to data centers. Hence, training topic models in industrial scenarios using conventional approaches faces a dilemma: A party (i.e., a company or institute) has to either tolerate data scarcity or sacrifice data privacy. In this article, we propose a framework named Industrial Federated Topic Modeling (iFTM), in which multiple parties collaboratively train a high-quality topic model by simultaneously alleviating data scarcity and maintaining immunity to privacy adversaries. iFTM is inspired by federated learning, supports two representative topic models (i.e., Latent Dirichlet Allocation and SentenceLDA) in industrial applications, and consists of novel techniques such as private Metropolis-Hastings, topic-wise normalization, and heterogeneous model integration. We conduct quantitative evaluations to verify the effectiveness of iFTM and deploy iFTM in two real-life applications to demonstrate its utility. Experimental results verify iFTM’s superiority over conventional topic modeling.

Download Full-text

Topic Modeling Using Latent Dirichlet allocation

ACM Computing Surveys ◽

10.1145/3462478 ◽

2022 ◽

Vol 54 (7) ◽

pp. 1-35

Author(s):

Uttam Chauhan ◽

Apurva Shah

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Research Work ◽

Topic Models ◽

Small Subset ◽

Distributed Environment ◽

Future Directions ◽

Probabilistic Topic Modeling ◽

Modeling Techniques ◽

Evaluation Techniques

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

Download Full-text

Impacts of Language Barrier and Social Knowledge on Knowledge Transfer: An Empirical Study of Korean Subsidiaries in Vietnam

INTERNATIONAL BUSINESS REVIEW ◽

10.21739/ibr.2010.06.14.2.51 ◽

2010 ◽

Vol 14 (2) ◽

pp. 51

Author(s):

Kyoung Kim

Keyword(s):

Knowledge Transfer ◽

Empirical Study ◽

Language Barrier ◽

Social Knowledge

Download Full-text

Improving Topic Coherence Using Entity Extraction Denoising

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2018-0004 ◽

2018 ◽

Vol 110 (1) ◽

pp. 85-101 ◽

Cited By ~ 1

Author(s):

Ronald Cardenas ◽

Kevin Bello ◽

Alberto Coronado ◽

Elizabeth Villota

Keyword(s):

Topic Modeling ◽

Human Perception ◽

Relevant Information ◽

Entity Recognition ◽

Entity Extraction ◽

Fine Grained ◽

Job Advertisement ◽

Coherence Score ◽

Probabilistic Topic Modeling ◽

Promising Solution

Abstract Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, we show experimental evidence of the improvement of topic coherence score by restricting the training corpus to that of relevant information in the document obtained by Entity Recognition. We experiment with job advertisement data and find that with this approach topic models improve interpretability in about 40 percentage points on average. Our analysis reveals as well that using the extracted text chunks, some redundant topics are joined while others are split into more skill-specific topics. Fine-grained topics observed in models using the whole text are preserved.

Download Full-text

A Study on the Characteristics of Academic Topics Related to Renewable Energy Using the Structural Topic Modeling and the Weak Signal Concept

Energies ◽

10.3390/en14051497 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1497

Author(s):

Chankook Park ◽

Minkyu Kim

Keyword(s):

Renewable Energy ◽

Energy Storage ◽

Topic Modeling ◽

Academic Research ◽

High Rate ◽

Weak Signals ◽

Energy Potential ◽

Rate Of Increase ◽

Probabilistic Topic Modeling ◽

To Receive

It is important to examine in detail how the distribution of academic research topics related to renewable energy is structured and which topics are likely to receive new attention in the future in order for scientists to contribute to the development of renewable energy. This study uses an advanced probabilistic topic modeling to statistically examine the temporal changes of renewable energy topics by using academic abstracts from 2010–2019 and explores the properties of the topics from the perspective of future signs such as weak signals. As a result, in strong signals, methods for optimally integrating renewable energy into the power grid are paid great attention. In weak signals, interest in large-capacity energy storage systems such as hydrogen, supercapacitors, and compressed air energy storage showed a high rate of increase. In not-strong-but-well-known signals, comprehensive topics have been included, such as renewable energy potential, barriers, and policies. The approach of this study is applicable not only to renewable energy but also to other subjects.

Download Full-text

Plant Phenotyping using Probabilistic Topic Models: Uncovering the Hyperspectral Language of Plants

Scientific Reports ◽

10.1038/srep22482 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 58

Author(s):

Mirwaes Wahabzada ◽

Anne-Katrin Mahlein ◽

Christian Bauckhage ◽

Ulrike Steiner ◽

Erich-Christian Oerke ◽

...

Keyword(s):

Topic Models ◽

Plant Phenotyping ◽

Probabilistic Topic Models

Download Full-text

Research of Knowledge Transfer Mechanisms and Environmental Uncertaint towards Renewable Energy in Thailand

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.522-524.1850 ◽

2014 ◽

Vol 522-524 ◽

pp. 1850-1852

Author(s):

Chun Wang Tsou ◽

Pakarapong Supakarapongkul ◽

Saksit Pornjirattikal ◽

Yin Tsuo Huang

Keyword(s):

Knowledge Transfer ◽

Competitive Advantage ◽

Environmental Uncertainty ◽

Project Managers ◽

Dynamic Capability ◽

Future Research ◽

Knowledge Transfers ◽

Energy Companies ◽

The Relationship ◽

Transfer Mechanisms

This explanatory research explores the relationship among environmental uncertainty, knowledge transfers mechanisms, dynamic capability, and competitive advantage. A total of 235 project managers employed by energy companies in Thailand were invited to participate in the study. The findings indicated that (a) through knowledge transfer mechanisms, project teams could develop an energy enterprises core competence and build its competitive advantage, (b) the relationship between environmental uncertainty and knowledge transfer mechanisms is negative, and (c) dynamic capability and competitive advantage have a positive relationship. The limitations of the study regarding generalization, and recommendations for future research to replicate the study in other countries, are also included.

Download Full-text

Probabilistic Topic Models

Practical Text Analytics - Advances in Analytics and Data Science ◽

10.1007/978-3-319-95663-3_8 ◽

2018 ◽

pp. 117-130 ◽

Cited By ~ 1

Author(s):

Murugan Anandarajan ◽

Chelsey Hill ◽

Thomas Nolan

Keyword(s):

Topic Models ◽

Probabilistic Topic Models

Download Full-text