Review of Trends in Topic Modeling Techniques, Tools, Inference Algorithms and Applications

Author(s):  
Christine K. Mulunda ◽  
Peter W. Wagacha ◽  
Lawrence Muchemi
Author(s):  
Zarmeen Nasim

This research is an endeavor to combine deep-learning-based language modeling with classical topic modeling techniques to produce interpretable topics for a given set of documents in Urdu, a low resource language. The existing topic modeling techniques produce a collection of words, often un-interpretable, as suggested topics without integrat-ing them into a semantically correct phrase/sentence. The proposed approach would first build an accurate Part of Speech (POS) tagger for the Urdu Language using a publicly available corpus of many million sentences. Using semanti-cally rich feature extraction approaches including Word2Vec and BERT, the proposed approach, in the next step, would experiment with different clus-tering and topic modeling techniques to produce a list of potential topics for a given set of documents. Finally, this list of topics would be sent to a labeler module to produce syntactically correct phrases that will represent interpretable topics.


Author(s):  
Ponmalar R ◽  
Ponnarasi D ◽  
Sangeetha A ◽  
Kingsy Grace R

Text mining is a process of converting unstructured data into meaningful data. It may be loosely characterized as the process of analyzing text to extract information that is useful for particular purposes. Topic modeling is a form of text mining, a way of identifying patterns in a corpus. The topics produced by topic modeling techniques are clusters of similar words that are frequently occur together. Topic modeling is also a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, a document is about a particular topic, one would expect particular words to appear in the document more or less frequently. This paper, presents a survey on topic modeling in clinical documents.


2021 ◽  
Vol 26 (6) ◽  
Author(s):  
Camila Costa Silva ◽  
Matthias Galster ◽  
Fabian Gilson

AbstractTopic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.


Author(s):  
Julien Velcin ◽  
Antoine Gourru ◽  
Erwan Giry-Fouquet ◽  
Christophe Gravier ◽  
Mathieu Roche ◽  
...  

Readitopics provides a new tool for browsing a textual corpus that showcases several recent work on topic labeling and topic coherence. We demonstrate the potential of these techniques to get a deeper understanding of the topics that structure different datasets. This tool is provided as a Web demo but it can be installed to experiment with your own dataset. It can be further extended to deal with more advanced topic modeling techniques.


Sign in / Sign up

Export Citation Format

Share Document