scholarly journals Readitopics: Make Your Topic Models Readable via Labeling and Browsing

Author(s):  
Julien Velcin ◽  
Antoine Gourru ◽  
Erwan Giry-Fouquet ◽  
Christophe Gravier ◽  
Mathieu Roche ◽  
...  

Readitopics provides a new tool for browsing a textual corpus that showcases several recent work on topic labeling and topic coherence. We demonstrate the potential of these techniques to get a deeper understanding of the topics that structure different datasets. This tool is provided as a Web demo but it can be installed to experiment with your own dataset. It can be further extended to deal with more advanced topic modeling techniques.

2022 ◽  
Vol 54 (7) ◽  
pp. 1-35
Author(s):  
Uttam Chauhan ◽  
Apurva Shah

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.


Author(s):  
Zarmeen Nasim

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often un-interpretable, as suggested topics without integrat-ing them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semanti-cally rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clus-tering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.


2021 ◽  
Author(s):  
Retno Kusumastuti ◽  
Mesnan Silalahi ◽  
Anugerah Yuka Asmara ◽  
Ria Hardiyati ◽  
Visnhu Juwono

Abstract Indigenous people have deep local knowledge of environmental sustainability and natural resource utilization, which are sources of innovations that often are drivers for economic growth in rural areas. This study explores the knowledge structure of indigenous innovation in village enterprises through content analysis of research publications. The resulting knowledge structure can be used to set up a roadmap for the studies on village enterprise and in a broader context to build metadata as a foundation for an evaluation system of village enterprise. The authors deploy topic modeling and co-word analyses to scrutinize 775 village enterprise research articles from the Scopus database and 665 paper from ScienceDirect. In the topic modeling, topic models village enterprises are setup. The topics found are local ownership (such as market and property), land, services (housing, health care), economy and public policy, financial service micro-credit, environmental pollution control, local business sustainability, social entrepreneurship, and household income, bioenergy based electrification, and bumdes management. Four sectors of the natural resource-based indigenous economy were identified: traditional food production, bio-energy for fuel and electricity, agriculture, and tourism. The topic models are used to comprehend knowledge structure in the village enterprises whereby the focus is to uncover the context of indigenous village enterprise and its states of the art.


Symmetry ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1486
Author(s):  
Zhinan Gou ◽  
Zheng Huo ◽  
Yuanzhen Liu ◽  
Yi Yang

Supervised topic modeling has been successfully applied in the fields of document classification and tag recommendation in recent years. However, most existing models neglect the fact that topic terms have the ability to distinguish topics. In this paper, we propose a term frequency-inverse topic frequency (TF-ITF) method for constructing a supervised topic model, in which the weight of each topic term indicates the ability to distinguish topics. We conduct a series of experiments with not only the symmetric Dirichlet prior parameters but also the asymmetric Dirichlet prior parameters. Experimental results demonstrate that the result of introducing TF-ITF into a supervised topic model outperforms several state-of-the-art supervised topic models.


Author(s):  
Ponmalar R ◽  
Ponnarasi D ◽  
Sangeetha A ◽  
Kingsy Grace R

Text mining is a process of converting unstructured data into meaningful data. It may be loosely characterized as the process of analyzing text to extract information that is useful for particular purposes. Topic modeling is a form of text mining, a way of identifying patterns in a corpus. The topics produced by topic modeling techniques are clusters of similar words that are frequently occur together. Topic modeling is also a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, a document is about a particular topic, one would expect particular words to appear in the document more or less frequently. This paper, presents a survey on topic modeling in clinical documents.


Sign in / Sign up

Export Citation Format

Share Document