Examining LDA2Vec and Tweet Pooling for Topic Modeling on Twitter Data

WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS ◽

10.37394/23209.2021.18.13 ◽

2021 ◽

Vol 18 ◽

pp. 102-114

Author(s):

Kristofferson Culmer ◽

Jeffrey Uhlmann

Keyword(s):

Statistical Analysis ◽

Topic Modeling ◽

Topic Model ◽

Text Documents ◽

Short Text ◽

Amount Of Information ◽

Twitter Data

The short lengths of tweets present a challenge for topic modeling to extend beyond what is provided explicitly from hashtag information. This is particularly true for LDAbased methods because the amount of information available from pertweet statistical analysis is severely limited. In this paper we present LDA2Vec paired with temporal tweet pooling (LDA2VecTTP) and assess its performance on this problem relative to traditional LDA and to Biterm Topic Model (Biterm), which was developed specifically for topic modeling on short text documents. We paired each of the three topic modeling algorithms with three tweet pooling schemes: no pooling, authorbased pooling, and temporal pooling. We then conducted topic modeling on two Twitter datasets using each of the algorithms and the tweet pooling schemes. Our results on the largest dataset suggest that LDA2VecTTP can produce higher coherence scores and more logically coherent and interpretable topics.

Download Full-text

Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x1801800107 ◽

2018 ◽

Vol 18 (1) ◽

pp. 101-117 ◽

Cited By ~ 10

Author(s):

Carlo Schwarz

Keyword(s):

Machine Learning ◽

Probability Distribution ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Topic Models ◽

Text Documents ◽

Text Data ◽

Dirichlet Allocation

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.

Download Full-text

CLDA: An Effective Topic Model for Mining User Interest Preference under Big Data Background

Complexity ◽

10.1155/2018/2503816 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Lirong Qiu ◽

Jia Yu

Keyword(s):

Big Data ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

User Interest ◽

Text Data ◽

Data Set ◽

Data Sparsity ◽

Short Text ◽

Text Filtering

In the present big data background, how to effectively excavate useful information is the problem that big data is facing now. The purpose of this study is to construct a more effective method of mining interest preferences of users in a particular field in the context of today’s big data. We mainly use a large number of user text data from microblog to study. LDA is an effective method of text mining, but it will not play a very good role in applying LDA directly to a large number of short texts in microblog. In today’s more effective topic modeling project, short texts need to be aggregated into long texts to avoid data sparsity. However, aggregated short texts are mixed with a lot of noise, reducing the accuracy of mining the user’s interest preferences. In this paper, we propose Combining Latent Dirichlet Allocation (CLDA), a new topic model that can learn the potential topics of microblog short texts and long texts simultaneously. The data sparsity of short texts is avoided by aggregating long texts to assist in learning short texts. Short text filtering long text is reused to improve mining accuracy, making long texts and short texts effectively combined. Experimental results in a real microblog data set show that CLDA outperforms many advanced models in mining user interest, and we also confirm that CLDA also has good performance in recommending systems.

Download Full-text

Layer-Assisted Neural Topic Modeling over Document Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/433 ◽

2021 ◽

Author(s):

Yiming Wang ◽

Ximing Li ◽

Jihong Ouyang

Keyword(s):

Text Classification ◽

Topic Modeling ◽

Link Prediction ◽

Topic Model ◽

Web Pages ◽

Text Documents ◽

Text Data ◽

Generative Process ◽

Network Links ◽

Scientific Papers

Neural topic modeling provides a flexible, efficient, and powerful way to extract topic representations from text documents. Unfortunately, most existing models cannot handle the text data with network links, such as web pages with hyperlinks and scientific papers with citations. To resolve this kind of data, we develop a novel neural topic model , namely Layer-Assisted Neural Topic Model (LANTM), which can be interpreted from the perspective of variational auto-encoders. Our major motivation is to enhance the topic representation encoding by not only using text contents, but also the assisted network links. Specifically, LANTM encodes the texts and network links to the topic representations by an augmented network with graph convolutional modules, and decodes them by maximizing the likelihood of the generative process. The neural variational inference is adopted for efficient inference. Experimental results validate that LANTM significantly outperforms the existing models on topic quality, text classification and link prediction..

Download Full-text

Applying Word Co-Occurrence Graph in Enhancing LDA Model for Topic Discovering in Large-Scaled Text Corpus

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1068.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1366-1371

Keyword(s):

Statistical Analysis ◽

Topic Modeling ◽

Topic Model ◽

Bag Of Words ◽

Text Corpus ◽

Document Collections ◽

Text Document ◽

Novel Approach ◽

Proposed Model ◽

Low Performance

Topic modeling, such as LDA is considered as a useful tool for the statistical analysis of text document collections and other text-based data. Recently, topic modeling becomes an attractive researching field due to its wide applications. However, there are remained disadvantages of traditional topic modeling like as LDA due the shortcoming of bag-of-words (BOW) model as well as low-performance in handle large text corpus. Therefore, in this paper, we present a novel approach of topic model, called LDA-GOW, which is the combination of word co-occurrence, also called: graph-of-words (GOW) model and traditional LDA topic discovering model. The LDA-GOW topic model not only enable to extract more informative topics from text but also be able to leverage the topic discovering process from large-scaled text corpus. We test our proposed model in comparing with the traditional LDA topic model, within several standardized datasets, include: WebKB, Reuters-R8 and annotated scientific documents which are collected from ACM digital library to demonstrate the effectiveness of our proposed model. For overall experiments, our proposed LDA-GOW model gains approximately 70.86% in accuracy.

Download Full-text

Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings

The Computer Journal ◽

10.1093/comjnl/bxy037 ◽

2018 ◽

Vol 62 (3) ◽

pp. 359-372 ◽

Cited By ~ 5

Author(s):

Ximing Li ◽

Ang Zhang ◽

Changchun Li ◽

Lantian Guo ◽

Wenting Wang ◽

...

Keyword(s):

Topic Modeling ◽

Topic Model ◽

Word Embeddings ◽

Short Text

Download Full-text

Topic Models: A Tutorial with R

International Journal of Semantic Computing ◽

10.1142/s1793351x14500044 ◽

2014 ◽

Vol 08 (01) ◽

pp. 85-98 ◽

Cited By ~ 3

Author(s):

G. Manning Richardson ◽

Janet Bowers ◽

A. John Woodill ◽

Joseph R. Barr ◽

Jean Mark Gawron ◽

...

Keyword(s):

Topic Model ◽

Topic Models ◽

Text Analytics ◽

Text Documents ◽

Short Text ◽

Document Databases

This tutorial presents topic models for organizing and comparing documents. The technique and corresponding discussion focuses on analysis of short text documents, particularly micro-blogs. However, the base topic model and R implementation are generally applicable to text analytics of document databases.

Download Full-text

Statistical analysis of classification

Symposium - International Astronomical Union ◽

10.1017/s0074180900053420 ◽

1966 ◽

Vol 24 ◽

pp. 188-189

Author(s):

T. J. Deeming

Keyword(s):

Multivariate Analysis ◽

Statistical Analysis ◽

Classification Scheme ◽

Analytical Procedure ◽

Narrow Band ◽

Scientific Research ◽

Maximum Amount ◽

Amount Of Information

If we make a set of measurements, such as narrow-band or multicolour photo-electric measurements, which are designed to improve a scheme of classification, and in particular if they are designed to extend the number of dimensions of classification, i.e. the number of classification parameters, then some important problems of analytical procedure arise. First, it is important not to reproduce the errors of the classification scheme which we are trying to improve. Second, when trying to extend the number of dimensions of classification we have little or nothing with which to test the validity of the new parameters.Problems similar to these have occurred in other areas of scientific research (notably psychology and education) and the branch of Statistics called Multivariate Analysis has been developed to deal with them. The techniques of this subject are largely unknown to astronomers, but, if carefully applied, they should at the very least ensure that the astronomer gets the maximum amount of information out of his data and does not waste his time looking for information which is not there. More optimistically, these techniques are potentially capable of indicating the number of classification parameters necessary and giving specific formulas for computing them, as well as pinpointing those particular measurements which are most crucial for determining the classification parameters.

Download Full-text

A Study on Bestseller Short Text Semantics Analysis Using Topic Model

The Journal of Image and Cultural Contents ◽

10.24174/jicc.2018.10.15.101 ◽

2018 ◽

Vol 15 ◽

pp. 101-112

Author(s):

So-Hyun Park ◽

Ae-Rin Song ◽

Young-Ho Park ◽

Sun-Young Ihm

Keyword(s):

Topic Model ◽

Short Text

Download Full-text

Latent Topic Model for Indexing Arabic Documents

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014010102 ◽

2014 ◽

Vol 4 (1) ◽

pp. 29-45 ◽

Cited By ~ 3

Author(s):

Rami Ayadi ◽

Mohsen Maraoui ◽

Mounir Zrigui

Keyword(s):

Topic Model ◽

Inflectional Morphology ◽

Arabic Text ◽

Text Representation ◽

Text Documents ◽

Latent Topic ◽

Latent Topics ◽

F Measure

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.

Download Full-text

Hierarchical Summarization of Text Documents Using Topic Modeling and Formal Concept Analysis

Data Management, Analytics and Innovation - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-1274-8_2 ◽

2018 ◽

pp. 21-33 ◽

Cited By ~ 1

Author(s):

Nadeem Akhtar ◽

Hira Javed ◽

Tameem Ahmad

Keyword(s):

Formal Concept Analysis ◽

Topic Modeling ◽

Concept Analysis ◽

Formal Concept ◽

Text Documents

Download Full-text