A Tutorial on Probabilistic Topic Models for Text Data Retrieval and Analysis

Author(s):  
ChengXiang Zhai ◽  
Chase Geigle
2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Mirwaes Wahabzada ◽  
Anne-Katrin Mahlein ◽  
Christian Bauckhage ◽  
Ulrike Steiner ◽  
Erich-Christian Oerke ◽  
...  

Author(s):  
Murugan Anandarajan ◽  
Chelsey Hill ◽  
Thomas Nolan

2020 ◽  
Vol 25 (6) ◽  
pp. 755-769
Author(s):  
Noorullah R. Mohammed ◽  
Moulana Mohammed

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.


Author(s):  
Dat Quoc Nguyen ◽  
Richard Billingsley ◽  
Lan Du ◽  
Mark Johnson

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.


2012 ◽  
Vol 11 (3) ◽  
pp. 203-215 ◽  
Author(s):  
Xin Chen ◽  
TingTing He ◽  
Xiaohua Hu ◽  
Yanhong Zhou ◽  
Yuan An ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document