A Word Embedding Topic Model for Robust Inference of Topics and Visualization

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Download Full-text

A word embedding topic model for topic detection and summary in social networks

Measurement and Control ◽

10.1177/0020294019865750 ◽

2019 ◽

Vol 52 (9-10) ◽

pp. 1289-1298 ◽

Cited By ~ 1

Author(s):

Lei Shi ◽

Gang Cheng ◽

Shang-ru Xie ◽

Gang Xie

Keyword(s):

Social Networks ◽

Social Network ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Topic Model ◽

Word Embedding ◽

Probabilistic Latent Semantic Analysis ◽

Topic Detection ◽

Short Text ◽

Internal Relationship

The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.

Download Full-text

Exploiting word embedding for heterogeneous topic model towards patent recommendation

Scientometrics ◽

10.1007/s11192-020-03666-4 ◽

2020 ◽

Vol 125 (3) ◽

pp. 2091-2108

Author(s):

Jie Chen ◽

Jialin Chen ◽

Shu Zhao ◽

Yanping Zhang ◽

Jie Tang

Keyword(s):

Topic Model ◽

Word Embedding

Download Full-text

Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2019.2934654 ◽

2019 ◽

pp. 1-1 ◽

Cited By ~ 3

Author(s):

Mennatallah El-Assady ◽

Rebecca Kehlbeck ◽

Christopher Collins ◽

Daniel Keim ◽

Oliver Deussen

Keyword(s):

Topic Model ◽

Word Embedding ◽

Semantic Concept ◽

Model Refinement

Download Full-text

A Correlated Topic Model Using Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/588 ◽

2017 ◽

Cited By ~ 20

Author(s):

Guangxu Xun ◽

Yaliang Li ◽

Wayne Xin Zhao ◽

Jing Gao ◽

Aidong Zhang

Keyword(s):

Data Augmentation ◽

Topic Model ◽

Semantic Relatedness ◽

Word Embedding ◽

Word Embeddings ◽

Word Level ◽

Logistic Normal Distribution ◽

Proposed Model ◽

Correlation Information ◽

Correlated Topic Model

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.

Download Full-text

Joint Topical Word Embedding for Detecting Drift in Social Media Text

10.21203/rs.3.rs-90835/v2 ◽

2020 ◽

Author(s):

VIJAYARANI J ◽

Geetha T.V.

Keyword(s):

Social Media ◽

Topic Model ◽

Rate Of Change ◽

Word Embedding ◽

Human Interaction ◽

Langevin Dynamic ◽

Context Vector ◽

The Social ◽

Topic Distribution ◽

Social Media Text

Abstract Social media texts like tweets and blogs are collaboratively created by human interaction. Fast change in trends leads to topic drift in the social media text. This drift is usually associated with words and hashtags. However, geotags play an important part in determining topic distribution with location context. Rate of change in the distribution of words, hashtags and geotags cannot be considered as uniform and must be handled accordingly. This paper builds a topic model that associates topic with a mixture of distributions of words, hashtags and geotags. Stochastic gradient Langevin dynamic model with varying mini-batch sizes is used to capture the changes due to the asynchronous distribution of words and tags. Topical word embedding with co-occurrence and location contexts are specified as hashtag context vector and geotag context vector respectively. These two vectors are jointly learned to yield topical word embedding vectors related to tags context. Topical word embeddings over time conditioned on hashtags and geotags predict, location-based topical variations effectively. When evaluated with Chennai and UK geolocated Twitter data, the proposed joint topical word embedding model enhanced by the social tags context, outperforms other methods.

Download Full-text

Joint Topical Word Embedding for Detecting Drift in Social Media Text

10.21203/rs.3.rs-90835/v1 ◽

2020 ◽

Author(s):

VIJAYARANI J ◽

Geetha T.V.

Keyword(s):

Social Media ◽

Topic Model ◽

Rate Of Change ◽

Word Embedding ◽

Human Interaction ◽

Langevin Dynamic ◽

Context Vector ◽

The Social ◽

Topic Distribution ◽

Social Media Text

Abstract Social media texts like tweets and blogs are collaboratively created by human interaction. Fast change in trends leads to topic drift in the social media text. This drift is usually associated with words and hashtags. However, geotags play an important part in determining topic distribution with location context. Rate of change in the distribution of words, hashtags and geotags cannot be considered as uniform and must be handled accordingly. This paper builds a topic model that associates topic with a mixture of distributions of words, hashtags and geotags. Stochastic gradient Langevin dynamic model with varying mini-batch sizes is used to capture the changes due to the asynchronous distribution of words and tags. Topical word embedding with co-occurrence and location contexts are specified as hashtag context vector and geotag context vector respectively. These two vectors are jointly learned to yield topical word embedding vectors related to tags context. Topical word embeddings over time conditioned on hashtags and geotags predict, location-based topical variations effectively. When evaluated with Chennai and UK geolocated Twitter data, the proposed joint topical word embedding model enhanced by the social tags context, outperforms other methods.

Download Full-text

Topic Word Embedding-Based Methods for Automatically Extracting Main Aspects from Product Reviews

Applied Sciences ◽

10.3390/app10113831 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3831 ◽

Cited By ~ 1

Author(s):

Sang-Min Park ◽

Sung Joon Lee ◽

Byung-Won On

Keyword(s):

Prior Knowledge ◽

Topic Model ◽

Unsupervised Clustering ◽

Word Embedding ◽

Experimental Results ◽

Product Reviews ◽

Baseline Method ◽

Text Documents ◽

Novel Approach ◽

Topic Word

Detecting the main aspects of a particular product from a collection of review documents is so challenging in real applications. To address this problem, we focus on utilizing existing topic models that can briefly summarize large text documents. Unlike existing approaches that are limited because of modifying any topic model or using seed opinion words as prior knowledge, we propose a novel approach of (1) identifying starting points for learning, (2) cleaning dirty topic results through word embedding and unsupervised clustering, and (3) automatically generating right aspects using topic and head word embedding. Experimental results show that the proposed methods create more clean topics, improving about 25% of Rouge–1, compared to the baseline method. In addition, through the proposed three methods, the main aspects suitable for given data are detected automatically.

Download Full-text

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

Sensors ◽

10.3390/s19173728 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3728 ◽

Cited By ~ 5

Author(s):

Zhou ◽

Wang ◽

Sun ◽

Sun

Keyword(s):

Language Processing ◽

Text Categorization ◽

Topic Model ◽

Main Idea ◽

Feature Weighting ◽

Word Embedding ◽

Text Representation ◽

Short Text ◽

Representation Method ◽

Feature Probability

Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method.

Download Full-text

A Word Embedding Topic Model for Robust Inference of Topics and Visualization

Research on Web Service Clustering Method Based on Word Embedding and Topic Model

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

A word embedding topic model for topic detection and summary in social networks

Exploiting word embedding for heterogeneous topic model towards patent recommendation

Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

A Correlated Topic Model Using Word Embeddings

Joint Topical Word Embedding for Detecting Drift in Social Media Text

Joint Topical Word Embedding for Detecting Drift in Social Media Text

Topic Word Embedding-Based Methods for Automatically Extracting Main Aspects from Product Reviews

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

Export Citation Format